Rate this Page

★ ★ ★ ★ ★

beginner/mosaic_memory_profiling_tutorial

Run in Google Colab

Note

Go to the end to download the full example code.

Mosaic: Memory Profiling for PyTorch#

Author: Basil Wong

What you will learn

How to capture and analyze PyTorch memory snapshots
Identify memory savings from activation checkpointing
Debug unexpected memory usage from abandoned code
Integrate memory analysis into training pipelines

Prerequisites

PyTorch v2.0.0 or later
CUDA-capable GPU
Basic understanding of PyTorch training loops

This tutorial demonstrates how to use Mosaic, a post-processing memory snapshot analysis tool for PyTorch. Mosaic helps analyze GPU memory usage in distributed deep learning, providing detailed insights into memory allocations, peak usage, and memory imbalances across parallel workers.

Mosaic was instrumental in debugging OOM issues during the 405B LLaMA training and is now open source.

Introduction to Mosaic#

Overview#

In distributed deep learning, understanding GPU memory usage is critical for optimizing training efficiency and debugging Out-of-Memory (OOM) errors. Mosaic is a post-analysis tool for memory usage designed to work with large-scale jobs. It helps analyze PyTorch memory snapshots captured during the execution of PyTorch training jobs, providing detailed insights into memory allocations, peak usage, and memory imbalances across parallel workers.

Getting Started#

Clone the mosaic repository and install from the mosaic directory:

git clone https://github.com/facebookresearch/mosaic
cd mosaic
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
pip3 install -e .

Alternatively, install directly via pip:

pip install git+https://github.com/facebookresearch/mosaic.git

Simple Usage Examples#

1. Peak Memory Usage Analysis

When addressing memory problems like OOM errors, focusing on peak memory usage is crucial. The mosaic_get_memory_usage_peak command presents a stack trace of the memory allocations that contributed to the peak memory usage:

mosaic_get_memory_usage_peak --snapshot <path to snapshot>

2. Categorical Memory Profiling

Mosaic classifies allocations into categories (activation, backward, optimizer, etc.):

Activation Memory: Tensors saved for backward pass
Gradient Memory: Gradients computed during backpropagation
Optimizer State: Adam/SGD momentum and variance buffers
Parameter Memory: Model weights

mosaic_get_memory_profile --snapshot <path> --out-path <html> \
    --profile categories

An example HTML output looks like:

Mosaic categorical memory profiling without allocation ordering — Categorical memory profiling showing memory breakdown by type (activation, gradient, optimizer, etc.)#

To maintain allocation order for the categories, add --preserve-allocation-order:

mosaic_get_memory_profile --snapshot <path> --out-path <html> \
    --profile categories --preserve-allocation-order

Mosaic categorical memory profiling with allocation ordering preserved — Categorical profiling with `--preserve-allocation-order` shows memory allocations in chronological order#

3. Custom Dictionary Profiling

For targeted analysis via regex pattern matching:

mosaic_get_memory_profile --snapshot <path> --profile custom \
    --custom-profile '{"ncclx": "ncclx"}'

This is invaluable for tracking specific kernels, optimizers, or custom code patterns:

Mosaic custom dictionary profiling with ncclx pattern — Custom profiling with regex patterns to track specific operations like NCCL communications#

Dependencies and Imports#

Let’s set up the required dependencies and imports for this tutorial.

import subprocess
import sys
import shutil
from contextlib import contextmanager
import pickle

# Fix for sphinx-gallery environment where __main__.__file__ may not exist
# This is needed for transformers library compatibility
import os
if not hasattr(sys.modules["__main__"], "__file__"):
    # Use this file's path as a fallback, or a dummy path if __file__ is not available
    try:
        sys.modules["__main__"].__file__ = os.path.abspath(__file__)
    except NameError:
        # __file__ not available, use transformers modeling file as fallback
        import transformers.modeling_utils
        sys.modules["__main__"].__file__ = transformers.modeling_utils.__file__

import torch
from torch.utils.data import DataLoader, Dataset

# Install dependencies if needed
try:
    from transformers import GPT2LMHeadModel, GPT2Tokenizer
    from transformers.modeling_outputs import CausalLMOutputWithCrossAttentions
except ImportError:
    subprocess.check_call(
        [sys.executable, "-m", "pip", "install", "-q", "transformers"]
    )
    from transformers import GPT2LMHeadModel, GPT2Tokenizer
    from transformers.modeling_outputs import CausalLMOutputWithCrossAttentions

try:
    from mosaic.libmosaic.analyzer.memory_abstract import MemoryAbstract
except ImportError:
    subprocess.check_call(
        [
            sys.executable,
            "-m",
            "pip",
            "install",
            "-q",
            "git+https://github.com/facebookresearch/mosaic.git",
        ]
    )
    from mosaic.libmosaic.analyzer.memory_abstract import MemoryAbstract

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.10.0+cu128
CUDA available: True
GPU: NVIDIA A10G

Shared Utilities#

These helper classes and functions are used throughout the tutorial.

class RandomTokenDataset(Dataset):
    """Generates random token sequences for training.

    This dataset creates random input sequences suitable for language model
    training, simulating real training data without requiring actual text.
    """

    def __init__(self, vocab_size, seq_length=512, num_samples=100, seed=None):
        self.vocab_size = vocab_size
        self.seq_length = seq_length
        self.num_samples = num_samples
        self.generator = None
        if seed is not None:
            self.generator = torch.Generator().manual_seed(seed)

    def __len__(self):
        return self.num_samples

    def __getitem__(self, idx):  # noqa: ARG002
        if self.generator is not None:
            input_ids = torch.randint(
                0, self.vocab_size, (self.seq_length,), generator=self.generator
            )
        else:
            input_ids = torch.randint(0, self.vocab_size, (self.seq_length,))
        return {"input_ids": input_ids, "labels": input_ids.clone()}


@contextmanager
def capture_memory_snapshot(output_path):
    """Context manager to capture and save PyTorch CUDA memory snapshots.

    This captures all GPU memory allocations during the context and saves
    them to a pickle file for later analysis with Mosaic.

    Args:
        output_path: Path to save the memory snapshot pickle file.
    """
    torch.cuda.memory._record_memory_history(max_entries=100000)
    try:
        yield
    finally:
        snapshot = torch.cuda.memory._snapshot()
        torch.cuda.memory._record_memory_history(enabled=None)
        with open(output_path, "wb") as f:
            pickle.dump(snapshot, f)
        print(f"✓ Memory snapshot saved to {output_path}")

Case 1: Understanding Memory Differences with Activation Checkpointing#

This section demonstrates how to use Mosaic to analyze and compare GPU memory usage between different model configurations.

What we’ll do:

Train GPT-2 and capture a memory snapshot (baseline)
Enable activation checkpointing and train again (modified)
Use Mosaic to identify exactly where memory savings occur

Training Function for Activation Checkpointing Comparison#

def run_training_ac(
    activation_checkpointing: bool,
    snapshot_path: str,
    batch_size: int = 4,
    seq_length: int = 512,
    num_steps: int = 5,
):
    """Run training loop and capture memory snapshot.

    Args:
        activation_checkpointing: Whether to enable gradient checkpointing.
        snapshot_path: Path to save the memory snapshot.
        batch_size: Training batch size.
        seq_length: Sequence length for input tokens.
        num_steps: Number of training steps to run.

    Returns:
        Peak GPU memory usage in GB.
    """
    # Clear any previous memory
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    device = torch.device("cuda")

    # Load model
    print(f"Loading GPT-2 (activation_checkpointing={activation_checkpointing})...")
    model = GPT2LMHeadModel.from_pretrained("gpt2")

    if activation_checkpointing:
        model.gradient_checkpointing_enable()
        print("Activation checkpointing is ENABLED")
    else:
        print("Activation checkpointing is DISABLED")

    model = model.to(device)
    model.train()

    # Create dataset and dataloader
    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    dataset = RandomTokenDataset(
        vocab_size=tokenizer.vocab_size,
        seq_length=seq_length,
        num_samples=100,
    )
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

    # Setup optimizer
    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)

    # Training loop with memory capture
    print(f"Running {num_steps} training steps...")

    with capture_memory_snapshot(snapshot_path):
        for step, batch in enumerate(dataloader):
            if step >= num_steps:
                break

            batch = {k: v.to(device) for k, v in batch.items()}

            optimizer.zero_grad()
            outputs = model(input_ids=batch["input_ids"], labels=batch["labels"])
            loss = outputs.loss
            loss.backward()
            optimizer.step()

            print(f"  Step {step + 1}/{num_steps}, Loss: {loss.item():.4f}")

    peak_memory_gb = torch.cuda.max_memory_allocated() / (1024**3)
    print(f"✓ Peak GPU memory: {peak_memory_gb:.2f} GB")

    # Cleanup
    del model, optimizer
    torch.cuda.empty_cache()

    return peak_memory_gb

Run Baseline Training (Without Activation Checkpointing)#

Note

This tutorial requires a CUDA-capable GPU. If you’re running in Google Colab, make sure to select a GPU runtime: Runtime → Change runtime type → Hardware accelerator → GPU

if not torch.cuda.is_available():
    print("=" * 60)
    print("WARNING: No CUDA GPU detected!")
    print("=" * 60)
    print("\nThis tutorial requires a CUDA-capable GPU for memory profiling.")
    print("\nIf you're running in Google Colab:")
    print("  1. Go to Runtime → Change runtime type")
    print("  2. Set Hardware accelerator to 'GPU'")
    print("  3. Click 'Save' and re-run the notebook")
    print("\nSkipping GPU memory profiling examples...")
    HAS_CUDA = False
else:
    HAS_CUDA = True

# Check if Mosaic CLI is available
HAS_MOSAIC_CLI = shutil.which("mosaic_get_memory_profile") is not None
if HAS_CUDA and not HAS_MOSAIC_CLI:
    print("Note: Mosaic CLI not found. Install Mosaic to generate HTML profiles.")
    print("      pip install git+https://github.com/facebookresearch/mosaic.git")

if HAS_CUDA:
    print("=" * 60)
    print("BASELINE: Training WITHOUT Activation Checkpointing")
    print("=" * 60)

    baseline_memory = run_training_ac(
        activation_checkpointing=False,
        snapshot_path="snapshot_baseline.pickle",
        batch_size=4,
        seq_length=512,
        num_steps=5,
    )

============================================================
BASELINE: Training WITHOUT Activation Checkpointing
============================================================
Loading GPT-2 (activation_checkpointing=False)...

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]
Loading weights:   1%|          | 1/148 [00:00<00:00, 17697.49it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|          | 1/148 [00:00<00:00, 851.98it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 1356.94it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 1209.08it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1430.20it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1311.68it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1497.16it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1275.74it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 1116.63it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 991.75it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 1108.33it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 838.72it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 785.66it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 769.23it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 770.37it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 741.37it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 804.41it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 761.32it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 796.61it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 772.92it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 827.78it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 797.42it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 797.66it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 787.29it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 832.01it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 822.07it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 845.41it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 836.01it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 878.69it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 869.37it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 912.66it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 903.13it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 942.45it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 924.21it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 963.22it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 954.06it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 990.49it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 918.47it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 934.40it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 919.17it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 942.90it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 935.02it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 964.08it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 926.73it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 948.76it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 931.56it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 940.59it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 901.23it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 894.73it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 846.12it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 856.22it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 835.62it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 853.49it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 845.81it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 856.65it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 845.23it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 841.19it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 824.40it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 834.21it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 826.99it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 834.66it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 814.75it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 823.87it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 806.07it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 815.03it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 807.10it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 812.84it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 799.35it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 806.86it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 790.85it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 802.72it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 785.14it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 797.11it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 780.46it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 789.22it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 766.11it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 778.23it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 766.46it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 780.18it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 772.19it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 786.62it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 782.86it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 775.49it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 771.29it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 781.85it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 776.58it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 790.65it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 788.05it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 802.40it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 799.91it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 814.09it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 811.54it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 825.45it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 822.61it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 836.08it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 833.34it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 846.63it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 844.08it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 857.33it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 854.71it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 867.66it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 864.94it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 877.76it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 875.12it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 887.67it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 884.97it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 897.58it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 894.91it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 907.41it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 904.76it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 917.24it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 914.47it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 926.69it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 924.08it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 936.40it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 933.72it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 945.76it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 943.15it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 955.26it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 952.58it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 964.57it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 961.92it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 973.35it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 970.69it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 982.41it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 979.73it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 991.35it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 988.67it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1000.19it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 997.50it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1008.93it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1006.21it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1017.53it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1014.82it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1025.83it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1023.13it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1034.14it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1031.46it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1042.43it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1039.72it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1050.60it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1047.89it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1058.68it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1055.96it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1066.64it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1063.87it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1074.45it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1071.75it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1082.22it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1079.49it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1089.88it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1087.14it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1097.61it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1094.94it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1105.80it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1102.88it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1113.14it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1110.30it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1120.69it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1117.99it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1128.26it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1125.43it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1135.72it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1132.88it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1143.10it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1140.45it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1150.48it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1147.69it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1157.77it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1154.70it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1164.68it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1161.89it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1171.61it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1168.87it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1178.61it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1175.91it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1185.67it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1182.95it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1192.63it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1189.94it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1199.57it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1196.90it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1206.51it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1203.84it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1213.32it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1210.64it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1220.03it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1217.32it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1226.61it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1223.94it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1233.22it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1230.42it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1239.20it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1236.36it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1245.17it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1242.20it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1250.87it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1248.17it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1256.54it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1253.75it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1262.37it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1259.68it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1268.25it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1265.48it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1274.17it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1271.28it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1279.97it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1277.00it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1285.18it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1282.33it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1290.46it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1287.48it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1295.58it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1292.79it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1300.97it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1298.23it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1306.44it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1303.42it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1311.29it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1308.72it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1317.02it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1314.58it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1322.71it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1319.86it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1327.53it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1324.93it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1332.74it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1329.80it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1337.64it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1334.94it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1342.74it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1339.80it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1347.54it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1344.80it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1352.57it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1349.71it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1357.32it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1354.43it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1362.07it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1359.50it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1367.20it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1364.43it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1371.68it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1369.04it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1376.52it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1373.73it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1380.85it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1377.98it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1385.20it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1382.15it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1389.32it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1386.64it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1394.06it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1391.31it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1398.65it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1395.93it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1403.17it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1400.40it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1407.22it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1404.40it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1411.61it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1408.89it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1416.08it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1413.42it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1419.99it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1416.92it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1423.63it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1420.82it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1427.27it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1424.19it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1430.69it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1427.81it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1434.29it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1431.62it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1438.20it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1435.49it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1442.04it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1439.35it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1445.81it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1443.16it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1449.63it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1447.04it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1453.32it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1450.74it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1457.01it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1454.21it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1460.41it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1457.79it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1463.96it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1461.41it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1467.76it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1465.12it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1471.42it/s, Materializing param=transformer.wpe.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1468.79it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1475.22it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1475.22it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1475.22it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1465.58it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  |
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Activation checkpointing is DISABLED
Running 5 training steps...
`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.
  Step 1/5, Loss: 12.2627
  Step 2/5, Loss: 12.1125
  Step 3/5, Loss: 11.8959
  Step 4/5, Loss: 11.7933
  Step 5/5, Loss: 11.7729
✓ Memory snapshot saved to snapshot_baseline.pickle
✓ Peak GPU memory: 5.12 GB

Run Modified Training (With Activation Checkpointing)#

if HAS_CUDA:
    print("\n" + "=" * 60)
    print("MODIFIED: Training WITH Activation Checkpointing")
    print("=" * 60)

    ac_memory = run_training_ac(
        activation_checkpointing=True,
        snapshot_path="snapshot_with_ac.pickle",
        batch_size=4,
        seq_length=512,
        num_steps=5,
    )

    # Summary
    print("\n" + "=" * 60)
    print("MEMORY COMPARISON SUMMARY")
    print("=" * 60)
    print(f"Baseline (no AC):     {baseline_memory:.2f} GB")
    print(f"With AC:              {ac_memory:.2f} GB")
    if baseline_memory > 0:
        saved_pct = 100 * (baseline_memory - ac_memory) / baseline_memory
        print(
            f"Memory Saved:         {baseline_memory - ac_memory:.2f} GB ({saved_pct:.1f}%)"
        )

============================================================
MODIFIED: Training WITH Activation Checkpointing
============================================================
Loading GPT-2 (activation_checkpointing=True)...

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]
Loading weights:   1%|          | 1/148 [00:00<00:00, 45100.04it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|          | 1/148 [00:00<00:00, 4136.39it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 2873.80it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 784.50it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1016.55it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 786.14it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 910.27it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 807.26it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 884.54it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 757.01it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 753.08it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 734.02it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 738.36it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 722.00it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 768.75it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 695.49it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 727.53it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 715.70it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 759.11it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 748.64it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 786.31it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 748.02it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 795.18it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 785.45it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 771.39it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 749.73it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 774.86it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 745.36it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 774.02it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 757.11it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 774.34it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 747.90it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 769.83it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 753.51it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 737.19it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 713.96it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 726.48it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 705.46it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 699.64it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 695.04it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 721.75it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 717.14it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 725.70it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 712.65it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 688.13it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 676.04it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 672.50it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 663.18it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 673.14it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 651.47it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 666.41it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 650.48it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 664.93it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 656.50it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 672.38it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 665.06it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 683.65it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 675.71it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 684.33it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 679.19it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 688.73it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 681.38it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 688.62it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 682.98it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 700.19it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 697.67it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 715.07it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 712.59it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 729.80it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 727.30it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 744.38it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 741.12it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 758.45it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 756.09it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 773.30it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 770.91it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 787.95it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 785.56it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 802.44it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 800.03it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 816.73it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 814.32it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 830.84it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 828.40it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 844.83it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 842.19it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 858.45it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 855.95it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 872.03it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 869.52it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 885.43it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 882.90it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 898.68it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 896.09it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 911.62it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 909.03it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 924.47it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 921.87it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 937.12it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 934.46it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 949.52it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 946.88it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 961.82it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 959.15it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 974.00it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 971.33it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 986.10it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 983.41it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 997.81it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 995.08it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 1009.60it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 1006.88it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 1021.27it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 1018.53it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 1032.72it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 1029.67it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 1043.69it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 1040.93it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 1054.87it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 1052.08it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 1065.90it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 1063.09it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 1076.73it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 1073.91it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 1087.00it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 1083.73it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 1096.51it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 1093.31it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1105.96it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1102.84it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1115.26it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1111.66it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1124.18it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1120.98it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1133.32it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1130.11it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1142.37it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1139.10it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1151.13it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1147.96it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1159.97it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1156.76it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1168.64it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1165.39it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1177.13it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1173.97it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1185.51it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1182.33it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1193.83it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1190.66it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1201.97it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1198.72it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1210.15it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1206.85it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1218.10it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1214.85it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1226.14it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1222.76it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1233.91it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1230.65it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1241.73it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1238.40it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1249.28it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1245.90it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1256.68it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1253.47it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1263.79it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1260.48it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1271.13it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1267.58it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1277.96it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1274.75it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1285.29it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1282.02it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1292.36it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1289.13it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1299.38it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1296.13it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1306.40it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1303.22it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1313.44it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1310.16it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1320.19it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1316.95it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1327.01it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1323.76it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1333.71it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1330.46it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1340.31it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1336.77it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1346.44it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1343.21it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1352.55it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1349.52it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1358.92it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1355.65it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1365.08it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1361.93it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1371.47it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1368.33it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1377.82it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1374.74it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1384.09it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1381.05it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1390.26it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1387.25it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1396.57it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1393.42it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1402.61it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1399.35it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1408.26it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1405.27it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1414.34it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1411.26it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1420.30it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1417.15it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1425.57it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1422.50it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1431.29it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1428.26it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1437.15it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1434.13it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1442.83it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1439.75it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1448.38it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1445.32it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1454.03it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1450.96it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1459.59it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1456.54it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1464.89it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1461.80it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1470.22it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1467.17it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1475.53it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1472.53it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1480.89it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1477.77it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1485.93it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1482.85it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1491.03it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1487.93it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1495.99it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1492.95it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1501.01it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1497.90it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1505.81it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1502.78it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1510.65it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1507.63it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1515.35it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1512.31it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1520.16it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1517.14it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1524.99it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1521.97it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1529.79it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1526.80it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1534.49it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1531.49it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1539.16it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1536.16it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1543.80it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1540.75it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1548.25it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1545.29it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1552.39it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1549.38it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1556.77it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1553.77it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1561.01it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1557.98it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1565.35it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1562.37it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1569.84it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1566.84it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1574.22it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1571.25it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1578.58it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1575.58it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1582.87it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1579.81it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1586.95it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1583.93it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1591.06it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1588.10it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1595.25it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1592.21it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1599.28it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1596.33it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1603.23it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1600.19it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1607.36it/s, Materializing param=transformer.wpe.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1604.30it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1611.38it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1608.45it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1604.35it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  |
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Activation checkpointing is ENABLED
Running 5 training steps...
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
  Step 1/5, Loss: 12.2481
  Step 2/5, Loss: 12.1315
  Step 3/5, Loss: 11.9863
  Step 4/5, Loss: 11.8152
  Step 5/5, Loss: 11.7644
✓ Memory snapshot saved to snapshot_with_ac.pickle
✓ Peak GPU memory: 3.04 GB

============================================================
MEMORY COMPARISON SUMMARY
============================================================
Baseline (no AC):     5.12 GB
With AC:              3.04 GB
Memory Saved:         2.08 GB (40.6%)

Generate Categorical Memory Profiles with Mosaic#

Use Mosaic to generate HTML profiles for both snapshots.

if HAS_CUDA and HAS_MOSAIC_CLI:
    print("\n" + "=" * 60)
    print("MOSAIC: Categorical Memory Profiling")
    print("=" * 60)

    # Generate HTML profiles using subprocess
    print("\nGenerating baseline profile...")
    result1 = subprocess.run(
        [
            "mosaic_get_memory_profile",
            "--snapshot",
            "snapshot_baseline.pickle",
            "--out-path",
            "profile_baseline.html",
            "--profile",
            "categories",
            "--preserve-allocation-order",
            "--plotter_sampling_rate",
            "20",
        ],
        capture_output=True,
        text=True,
    )
    print(result1.stdout)
    if result1.stderr:
        print(result1.stderr)

    print("\nGenerating activation checkpointing profile...")
    result2 = subprocess.run(
        [
            "mosaic_get_memory_profile",
            "--snapshot",
            "snapshot_with_ac.pickle",
            "--out-path",
            "profile_with_ac.html",
            "--profile",
            "categories",
            "--preserve-allocation-order",
            "--plotter_sampling_rate",
            "20",
        ],
        capture_output=True,
        text=True,
    )
    print(result2.stdout)
    if result2.stderr:
        print(result2.stderr)

    if result1.returncode == 0 and result2.returncode == 0:
        print("\nGenerated profile_baseline.html")
        print("Generated profile_with_ac.html")
        print("\nDownload these files to view the interactive memory profiles.")
    else:
        print("\nNote: Mosaic profile generation encountered issues.")
        print("This may happen if running in an environment without full Mosaic support.")

============================================================
MOSAIC: Categorical Memory Profiling
============================================================

Generating baseline profile...
Memory Usage At Peak:
Total Allocated: 4.64GiB
Category Profile:
AllocationType.UNKNOWN: 32.0KB
AllocationType.ACTIVATION: 2.93GiB
AllocationType.BACKWARD: 793.39MB
AllocationType.OPTIMIZER: 949.4MB
Annotation Profile:
Compile Context Profile:
Custom Profile:

INFO:root:Loading snapshot snapshot_baseline.pickle using io read
INFO:root:Loading snapshot snapshot_baseline.pickle, size 3.99MB ...
INFO:root:Snapshot loaded successfully.
INFO:root:Profiling function took: 1.8884942531585693 seconds to run


Generating activation checkpointing profile...
Memory Usage At Peak:
Total Allocated: 2.55GiB
Category Profile:
AllocationType.UNKNOWN: 32.0KB
AllocationType.ACTIVATION: 871.79MB
AllocationType.BACKWARD: 785.27MB
AllocationType.OPTIMIZER: 949.4MB
Annotation Profile:
Compile Context Profile:
Custom Profile:

INFO:root:Loading snapshot snapshot_with_ac.pickle using io read
INFO:root:Loading snapshot snapshot_with_ac.pickle, size 6.86MB ...
INFO:root:Snapshot loaded successfully.
INFO:root:Profiling function took: 2.4196791648864746 seconds to run


Generated profile_baseline.html
Generated profile_with_ac.html

Download these files to view the interactive memory profiles.

Download Generated Files (Google Colab)#

If running in Google Colab, uncomment the following lines to download the generated snapshot and profile files:

# from google.colab import files
#
# print("Downloading memory snapshots and profiles...")
# files.download('snapshot_baseline.pickle')
# files.download('snapshot_with_ac.pickle')
# files.download('profile_baseline.html')
# files.download('profile_with_ac.html')

Results Interpretation: Activation Checkpointing#

The generated HTML profiles visualize memory usage over time, with allocations colored by category. Here’s what the profiles look like:

Baseline (without activation checkpointing): Notice the large activation memory (shown in one color) that persists throughout the forward pass.#

With activation checkpointing: Activation memory is significantly reduced as intermediate activations are discarded and recomputed during the backward pass.#

What We Observed#

Based on the Mosaic categorical profiling results:

Memory Comparison Results#
Metric	Baseline	With Activation Checkpointing	Difference
Total Peak Memory	4.62 GB	2.55 GB	2.07 GB (45% reduction)
Activation Memory	2.93 GB	872.79 MB	2.08 GB saved (71% reduction)
Backward/Gradient Memory	793.39 MB	785.27 MB	8 MB (minimal change)
Optimizer State	949.4 MB	949.4 MB	No change
Unknown	32 KB	32 KB	No change

Key Insights#

Primary Finding: Activation memory dropped from 2.93 GB → 872 MB (71% reduction), which accounts for nearly all the total memory savings.

Why Does This Happen?#

Activation checkpointing is a memory optimization technique that:

Without AC (Baseline): All intermediate activations from the forward pass are stored in memory for use during backpropagation. GPT-2 has 12 transformer layers, each storing multiple activations (attention outputs, MLP outputs, etc.). For batch_size=4, seq_length=512, this adds up quickly.
With AC (Optimized): Only activations at checkpoint boundaries are stored; intermediate activations are recomputed during the backward pass. This dramatically reduces activation memory (71% in our case) while other memory categories remain unchanged.

How Mosaic Helped#

Mosaic’s categorical profiling immediately identified:

Activation memory is the category with the largest difference (2.08 GB saved)
Backward/Gradient memory stayed nearly constant (793 MB → 785 MB)
Optimizer state remained unchanged (949 MB) - expected since model parameters don’t change

Without Mosaic: You would need to manually instrument your code, track allocations, and categorize them yourself.

With Mosaic: You get instant categorical breakdowns with exact numbers, making it trivial to identify/quantify memory optimizations.

Case 2: Debugging Unexpected Memory Usage#

This section demonstrates how to use Mosaic to debug when your model is using more memory than expected and you’re not sure why.

What we’ll do:

Train GPT-2 and capture a memory snapshot.
Train GPT-2 with a bug that introduces additional memory and capture a memory snapshot.
Use Mosaic to identify potential culprits introducing additional memory.

The Buggy Model#

This model has abandoned debug code that creates unnecessary GPU memory overhead. Someone added projection layers to “analyze hidden states” during debugging, but forgot to remove them before training.

class GPT2WithDebugOverhead(torch.nn.Module):
    """GPT2 wrapper with abandoned 'feature analysis' code that bloats peak memory.

    This wrapper adds extra projection layers that consume memory but serve no
    purpose - simulating abandoned debug code that was never cleaned up.
    """

    def __init__(self, base_model):
        super().__init__()
        self.base_model = base_model
        config = base_model.config

        # BUG: Large projection layers from an abandoned experiment
        self.debug_projections = torch.nn.ModuleList(
            [
                torch.nn.Linear(config.n_embd, config.n_embd * 4)
                for _ in range(config.n_layer)
            ]
        )

        debug_params = sum(p.numel() for p in self.debug_projections.parameters())
        print(f"  [DEBUG] Added {config.n_layer} debug projection layers")
        print(f"  [DEBUG] Extra parameters: {debug_params:,}")

    def forward(self, input_ids=None, labels=None, **kwargs):
        # Run normal GPT-2 forward with hidden states
        outputs = self.base_model(
            input_ids=input_ids,
            labels=labels,
            output_hidden_states=True,
            **kwargs,
        )

        # BUG: Project all hidden states through debug layers
        projected = []
        for _layer_idx, (hidden, proj) in enumerate(
            zip(outputs.hidden_states[1:], self.debug_projections)
        ):
            proj_hidden = proj(hidden)
            projected.append(proj_hidden)

        # Tie to loss so gradients flow through
        debug_regularization = sum(p.mean() for p in projected) * 1e-10

        return CausalLMOutputWithCrossAttentions(
            loss=outputs.loss + debug_regularization,
            logits=outputs.logits,
        )

Training Functions for Debug Comparison#

def run_training_clean(snapshot_path, num_steps=3):
    """Training with the normal model."""
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    device = torch.device("cuda")

    print("Loading clean model (no debug overhead)...")
    model = GPT2LMHeadModel.from_pretrained("gpt2").to(device)
    model.train()

    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    dataset = RandomTokenDataset(
        vocab_size=tokenizer.vocab_size, seq_length=512, seed=42
    )
    dataloader = DataLoader(dataset, batch_size=4, shuffle=False)

    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)

    print("Running training (should contain no debug overhead)...")

    with capture_memory_snapshot(snapshot_path):
        for step, batch in enumerate(dataloader):
            if step >= num_steps:
                break

            batch = {k: v.to(device) for k, v in batch.items()}
            optimizer.zero_grad()
            outputs = model(input_ids=batch["input_ids"], labels=batch["labels"])
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            print(f"  Step {step + 1}, Loss: {loss.item():.4f}")

    peak_memory = torch.cuda.max_memory_allocated() / 1024**3
    print(f"Peak GPU memory: {peak_memory:.2f} GB")

    del model, optimizer
    torch.cuda.empty_cache()

    return peak_memory


def run_training_with_bug(snapshot_path, num_steps=3):
    """Training with the buggy model."""
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    device = torch.device("cuda")

    print("Loading buggy model with debug overhead...")
    # Load pretrained GPT-2 and wrap it with the debug overhead
    base_model = GPT2LMHeadModel.from_pretrained("gpt2")
    model = GPT2WithDebugOverhead(base_model).to(device)

    model.train()

    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    dataset = RandomTokenDataset(
        vocab_size=tokenizer.vocab_size, seq_length=512, seed=42
    )
    dataloader = DataLoader(dataset, batch_size=4, shuffle=False)

    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)

    print("Running training (WITH debug overhead bug)...")

    with capture_memory_snapshot(snapshot_path):
        for step, batch in enumerate(dataloader):
            if step >= num_steps:
                break

            batch = {k: v.to(device) for k, v in batch.items()}
            optimizer.zero_grad()
            outputs = model(input_ids=batch["input_ids"], labels=batch["labels"])
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            print(f"  Step {step + 1}, Loss: {loss.item():.4f}")

    peak_memory = torch.cuda.max_memory_allocated() / 1024**3
    print(f"Peak GPU memory: {peak_memory:.2f} GB")

    del model, optimizer
    torch.cuda.empty_cache()

    return peak_memory

Run Training for Baseline (Clean Model)#

if HAS_CUDA:
    print("\n" + "=" * 60)
    print("Training with baseline model")
    print("=" * 60)

    baseline_memory_debug = run_training_clean(
        "snapshot_debug_baseline.pickle", num_steps=3
    )

============================================================
Training with baseline model
============================================================
Loading clean model (no debug overhead)...

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]
Loading weights:   1%|          | 1/148 [00:00<00:00, 49344.75it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|          | 1/148 [00:00<00:00, 4655.17it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 4088.02it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 2165.36it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 2249.76it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1959.34it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1912.59it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1738.39it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 1910.84it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 1685.95it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 1793.46it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 1691.25it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 1495.98it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 1375.70it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 1354.97it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 1309.59it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 1293.25it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 1258.96it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 1268.73it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 1239.34it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 1177.36it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 1093.02it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 1114.74it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 1059.44it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 1097.04it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 1078.25it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 1113.33it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 1095.94it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 1104.05it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 1075.26it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 1118.82it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 1076.22it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 1060.11it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 1038.21it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 1073.31it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 1061.54it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 1082.34it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 1044.62it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 996.33it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 974.43it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 990.25it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 961.75it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 964.77it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 951.02it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 964.83it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 950.12it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 960.16it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 921.15it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 912.99it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 891.63it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 889.21it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 865.52it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 867.86it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 838.95it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 861.25it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 853.69it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 855.21it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 836.58it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 852.22it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 837.02it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 823.51it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 779.33it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 776.77it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 767.54it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 779.99it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 776.61it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 790.86it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 784.59it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 774.13it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 770.74it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 774.00it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 765.98it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 763.36it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 757.25it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 763.00it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 754.44it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 758.79it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 755.74it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 770.57it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 767.60it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 782.33it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 779.27it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 793.88it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 789.99it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 805.25it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 802.67it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 818.14it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 815.91it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 831.26it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 828.87it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 844.09it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 841.72it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 856.74it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 854.34it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 869.23it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 866.76it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 881.55it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 879.14it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 893.74it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 891.30it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 905.78it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 903.31it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 917.65it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 915.16it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 929.51it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 927.01it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 941.10it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 938.39it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 952.49it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 950.01it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 963.96it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 961.50it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 975.39it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 972.88it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 986.18it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 983.65it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 997.21it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 994.69it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 1008.12it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 1005.58it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 1018.83it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 1016.23it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 1029.40it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 1026.98it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 1040.11it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 1037.54it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 1050.54it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 1047.93it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1060.80it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1058.21it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1070.74it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1068.08it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1080.72it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1078.07it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1090.58it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1087.92it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1100.40it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1097.69it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1109.99it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1107.28it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1119.54it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1116.83it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1128.87it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1126.16it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1138.16it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1135.44it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1147.25it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1144.12it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1155.19it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1151.91it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1162.54it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1159.06it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1169.96it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1166.73it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1177.54it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1174.43it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1185.35it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1182.35it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1193.14it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1190.08it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1200.78it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1197.73it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1208.05it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1205.05it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1215.53it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1212.69it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1223.06it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1219.36it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1229.41it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1226.18it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1235.63it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1232.42it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1242.16it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1238.97it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1248.89it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1245.89it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1255.67it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1252.24it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1261.77it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1258.69it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1268.42it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1265.34it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1274.94it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1271.87it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1281.40it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1278.29it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1287.74it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1284.68it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1294.01it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1290.77it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1300.04it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1296.97it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1306.18it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1303.11it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1312.17it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1309.05it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1318.20it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1315.08it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1324.06it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1320.93it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1329.94it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1326.86it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1335.84it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1332.77it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1341.72it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1338.65it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1347.43it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1344.33it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1352.68it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1349.66it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1358.00it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1354.87it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1363.47it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1360.15it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1368.08it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1365.21it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1373.30it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1370.15it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1378.09it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1374.79it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1382.51it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1379.32it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1387.35it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1384.26it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1392.17it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1388.94it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1396.57it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1393.42it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1401.36it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1398.23it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1405.99it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1403.04it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1411.16it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1408.14it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1416.28it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1413.38it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1421.05it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1417.94it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1425.45it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1422.30it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1429.80it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1426.71it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1434.06it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1431.03it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1438.20it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1435.13it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1442.42it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1439.36it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1446.48it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1443.60it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1450.90it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1447.97it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1454.87it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1451.92it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1458.90it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1456.02it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1462.90it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1459.98it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1467.02it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1463.68it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1470.84it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1468.17it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1475.32it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1472.41it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1479.43it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1476.66it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1483.69it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1481.03it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1488.00it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1485.34it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1492.32it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1489.64it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1496.62it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1493.97it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1500.84it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1498.26it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1505.24it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1502.42it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1509.19it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1506.31it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1512.71it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1509.96it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1516.23it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1513.39it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1519.73it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1516.97it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1523.45it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1520.57it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1527.04it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1524.22it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1531.05it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1528.20it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1535.04it/s, Materializing param=transformer.wpe.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1532.25it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1538.93it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1536.18it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1532.35it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  |
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Running training (should contain no debug overhead)...
  Step 1, Loss: 12.2979
  Step 2, Loss: 12.1139
  Step 3, Loss: 11.9992
✓ Memory snapshot saved to snapshot_debug_baseline.pickle
Peak GPU memory: 5.13 GB

Run Training WITH the Bug#

if HAS_CUDA:
    print("\n" + "=" * 60)
    print("Training with debug projection overhead (BUG)")
    print("=" * 60)

    buggy_memory = run_training_with_bug("snapshot_with_bug.pickle", num_steps=3)

============================================================
Training with debug projection overhead (BUG)
============================================================
Loading buggy model with debug overhead...

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]
Loading weights:   1%|          | 1/148 [00:00<00:00, 54471.48it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|          | 1/148 [00:00<00:00, 4284.27it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 3615.78it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 2604.35it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1083.99it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1015.57it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 785.85it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 708.74it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 674.35it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 657.44it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 722.12it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 629.59it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 708.10it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 694.77it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 769.99it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 756.57it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 823.35it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 809.43it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 874.05it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 859.31it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 918.10it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 902.83it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 960.73it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 945.98it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 996.82it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 981.97it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 1033.84it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 1019.24it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 1065.10it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 1050.55it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 1095.35it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 1080.83it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 1122.12it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 1106.95it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 1045.22it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 1032.57it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 1051.38it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 1039.83it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 1043.16it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 1031.86it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 1062.44it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 1038.44it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 1033.90it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 1015.82it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 1033.97it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 1024.64it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 1015.97it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 985.24it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 1000.26it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 992.23it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 1015.56it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 991.62it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 1000.21it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 986.79it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 1008.77it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 1000.86it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 1008.76it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 1001.63it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 1022.41it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 1006.62it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 1022.94it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 1016.39it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 1031.46it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 1024.51it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 1043.16it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 1036.71it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 1055.62it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 1049.08it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 1067.78it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 1061.33it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 1080.81it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 1074.50it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 1092.24it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 1085.89it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 1097.66it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 1091.26it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 1109.43it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 1102.78it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 1120.35it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 1106.03it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 1109.62it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 1104.10it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 1105.59it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 1100.15it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 1116.92it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 1111.88it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 1117.88it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 1112.84it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 1128.02it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 1104.10it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 1119.44it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 1114.61it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 1126.64it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 1121.98it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 1137.48it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 1132.84it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 1130.80it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 1125.98it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 1126.00it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 1115.38it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 1127.86it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 1122.94it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 1137.00it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 1132.14it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 1144.92it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 1140.14it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 1153.97it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 1149.22it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 1161.17it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 1156.30it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 1157.51it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 1152.67it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 1165.78it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 1161.62it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 1172.04it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 1167.93it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 1181.61it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 1177.52it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 1190.11it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 1186.10it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 1199.64it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 1195.69it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 1207.20it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 1202.34it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 1214.65it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 1210.29it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 1220.65it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 1216.17it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1223.93it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1219.75it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1211.82it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1202.89it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1210.93it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1207.07it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1207.56it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1203.67it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1196.25it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1192.57it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1192.24it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1188.36it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1198.47it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1194.56it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1204.14it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1200.56it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1210.75it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1207.02it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1216.44it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1212.94it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1214.07it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1206.88it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1214.79it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1211.30it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1211.42it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1202.03it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1209.05it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1205.30it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1210.34it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1198.29it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1206.46it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1203.26it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1196.36it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1181.44it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1178.97it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1171.17it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1176.64it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1173.56it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1182.00it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1178.59it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1181.65it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1178.62it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1178.59it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1156.79it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1160.48it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1157.61it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1160.26it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1154.62it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1161.73it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1159.04it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1163.49it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1160.81it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1160.40it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1151.22it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1151.57it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1148.35it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1156.88it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1154.61it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1163.93it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1161.70it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1171.06it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1168.81it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1177.82it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1175.61it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1184.83it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1182.60it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1191.69it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1189.49it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1198.63it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1196.41it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1205.46it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1203.24it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1212.04it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1209.75it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1218.77it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1216.59it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1225.59it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1223.36it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1232.28it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1230.06it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1238.37it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1236.14it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1244.83it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1242.61it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1251.33it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1249.12it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1257.77it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1255.53it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1264.13it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1261.88it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1270.42it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1268.20it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1276.71it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1274.41it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1282.69it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1280.44it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1288.81it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1286.58it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1294.99it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1292.69it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1301.06it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1298.83it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1307.14it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1304.91it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1313.20it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1310.98it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1319.20it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1316.96it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1325.11it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1322.88it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1330.99it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1328.74it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1336.78it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1334.53it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1342.50it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1340.26it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1348.14it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1345.66it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1353.55it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1351.30it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1359.08it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1356.83it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1364.71it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1362.47it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1370.30it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1368.03it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1375.83it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1373.62it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1381.39it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1379.15it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1386.80it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1384.54it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1392.13it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1389.89it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1397.48it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1394.72it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1402.07it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1399.78it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1407.22it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1404.69it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1412.06it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1409.80it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1417.19it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1414.92it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1422.31it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1420.08it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1427.46it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1425.23it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1432.57it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1430.34it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1437.66it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1435.42it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1442.70it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1440.43it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1447.61it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1445.38it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1452.54it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1450.29it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1457.39it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1455.14it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1462.21it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1459.80it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1466.94it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1464.71it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1471.86it/s, Materializing param=transformer.wpe.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1469.66it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1476.88it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1476.88it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1476.88it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1469.02it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  |
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
  [DEBUG] Added 12 debug projection layers
  [DEBUG] Extra parameters: 28,348,416
Running training (WITH debug overhead bug)...
  Step 1, Loss: 12.2373
  Step 2, Loss: 12.1021
  Step 3, Loss: 11.9843
✓ Memory snapshot saved to snapshot_with_bug.pickle
Peak GPU memory: 5.62 GB

Use Mosaic to Find the Problem#

Analyze both snapshots to identify the source of extra memory usage. We’ll run Mosaic’s peak memory analysis on each snapshot separately.

Analyze the Baseline (Clean) Snapshot#

if HAS_CUDA and HAS_MOSAIC_CLI:
    print("=" * 60)
    print("MOSAIC: Analyzing the Baseline Snapshot")
    print("=" * 60)

    result = subprocess.run(
        ["mosaic_get_memory_usage_peak", "--snapshot", "snapshot_debug_baseline.pickle"],
        capture_output=True,
        text=True,
    )
    print(result.stdout)
    if result.stderr:
        print(result.stderr)

============================================================
MOSAIC: Analyzing the Baseline Snapshot
============================================================

Num of Calls: 24, Memory Usage: 0.5625 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:283
                                                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/activations.py:66
                                                                                                                                                                                                                                                                                                                                                                              PyNumber_Multiply, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _Py_c_pow, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                                                                                                                                    _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_mul>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_mul(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::mul_Tensor::call(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::mul_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                torch::autograd::VariableType::(anonymous namespace)::mul_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::mul_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_mul_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::wrapper_CUDA_mul_Tensor(at::Tensor const&, at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::TensorIteratorBase::allocate_or_resize_outputs(), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::structured_mul_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 148, Memory Usage: 0.4635744094848633 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:715
                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                wrapper, /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:526
                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                      _use_grad, /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:81
                                                                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                            step, /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:238
                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                  _init_group, /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:178
                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                        torch::autograd::THPVariable_zeros_like(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                          at::_ops::zeros_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__zeros_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                              at::native::zeros_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                at::empty_like(at::Tensor const&, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                  at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                      at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                        at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                            at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                  at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                    at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                      at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                        at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 148, Memory Usage: 0.4635744094848633 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:715
                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                wrapper, /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:526
                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                      _use_grad, /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:81
                                                                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                            step, /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:238
                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                  _init_group, /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:182
                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                        torch::autograd::THPVariable_zeros_like(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                          at::_ops::zeros_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__zeros_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                              at::native::zeros_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                at::empty_like(at::Tensor const&, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                  at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                      at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                        at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                            at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                  at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                    at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                      at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                        at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.38343048095703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:823
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                          _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:134
                                                                                                                                                                                                                                                                                  _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                    PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                      torch::autograd::THPVariable_linear(_object*, _object*, _object*), python_nn_functions.cpp:0
                                                                                                                                                                                                                                                                                        at::_ops::linear::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&), ??:0
                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__linear>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                            at::native::linear(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&), ??:0
                                                                                                                                                                                                                                                                                              at::_ops::matmul::call(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__matmul>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                  at::native::matmul(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                    at::native::_matmul_impl(at::Tensor&, at::Tensor const&, at::Tensor const&), LinearAlgebra.cpp:0
                                                                                                                                                                                                                                                                                                      at::_ops::mm::call(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::mm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_3.cpp:0
                                                                                                                                                                                                                                                                                                          torch::autograd::VariableType::(anonymous namespace)::mm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_3.cpp:0
                                                                                                                                                                                                                                                                                                            at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::structured_mm_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                          at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                            at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                    torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                      torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                        torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.38343048095703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:828
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                ForCausalLMLoss, /usr/local/lib/python3.10/dist-packages/transformers/loss/loss_utils.py:66
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      fixed_cross_entropy, /usr/local/lib/python3.10/dist-packages/transformers/loss/loss_utils.py:36
                                                                                                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                            cross_entropy, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:3504
                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                  torch::autograd::THPVariable_cross_entropy_loss(_object*, _object*, _object*), python_nn_functions.cpp:0
                                                                                                                                                                                                                                                                                    at::_ops::cross_entropy_loss::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), ??:0
                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__cross_entropy_loss>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double> >, at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                        at::native::cross_entropy_loss_symint(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), ??:0
                                                                                                                                                                                                                                                                                          at::_ops::log_softmax_int::call(at::Tensor const&, long, std::optional<c10::ScalarType>), ??:0
                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, long, std::optional<c10::ScalarType>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd_int_log_softmax>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, long, std::optional<c10::ScalarType> > >, at::Tensor (at::Tensor const&, long, std::optional<c10::ScalarType>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, long, std::optional<c10::ScalarType>), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                              at::native::log_softmax(at::Tensor const&, long, std::optional<c10::ScalarType>), ??:0
                                                                                                                                                                                                                                                                                                at::_ops::_log_softmax::call(at::Tensor const&, long, bool), ??:0
                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, long, bool), &torch::autograd::VariableType::(anonymous namespace)::_log_softmax>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, long, bool> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, long, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, long, bool), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                    torch::autograd::VariableType::(anonymous namespace)::_log_softmax(c10::DispatchKeySet, at::Tensor const&, long, bool), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                      at::_ops::_log_softmax::redispatch(c10::DispatchKeySet, at::Tensor const&, long, bool), ??:0
                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, long, bool), &at::(anonymous namespace)::wrapper_CUDA__log_softmax>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, long, bool> >, at::Tensor (at::Tensor const&, long, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, long, bool), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                          at::meta::structured__log_softmax::meta(at::Tensor const&, long, bool), ??:0
                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::structured_log_softmax_cuda_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                      at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                              torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                  torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.38343048095703125 GiB
clone, ??:0
  pthread_condattr_setpshared, ??:0
    std::error_code::default_error_condition() const, ??:0
      torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), :0
        torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), ??:0
          torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&), ??:0
            torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&), ??:0
              torch::autograd::Node::operator()(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), :0
                torch::autograd::generated::NllLossBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), ??:0
                  torch::autograd::generated::NllLossBackward0_apply_functional(std::vector<at::Tensor, std::allocator<at::Tensor> >&&, std::array<bool, 1ul>, c10::SymInt&, long&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&), Functions.cpp:0
                    at::_ops::nll_loss_backward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), ??:0
                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::nll_loss_backward>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), VariableType_3.cpp:0
                        torch::autograd::VariableType::(anonymous namespace)::nll_loss_backward(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), VariableType_3.cpp:0
                          at::_ops::nll_loss_backward::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), ??:0
                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_nll_loss_backward>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&), RegisterCUDA_0.cpp:0
                              at::(anonymous namespace)::wrapper_CUDA_nll_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&), RegisterCUDA_0.cpp:0
                                at::meta::structured_nll_loss_backward::meta(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::OptionalTensorRef, long, long, at::Tensor const&), ??:0
                                  at::(anonymous namespace)::structured_nll_loss_backward_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                    at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                        at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                          at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                            at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                  c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                    torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                      torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                        torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.38343048095703125 GiB
clone, ??:0
  pthread_condattr_setpshared, ??:0
    std::error_code::default_error_condition() const, ??:0
      torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), :0
        torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), ??:0
          torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&), ??:0
            torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&), ??:0
              torch::autograd::Node::operator()(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), :0
                torch::autograd::generated::LogSoftmaxBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), ??:0
                  torch::autograd::generated::LogSoftmaxBackward0_apply_functional(std::vector<at::Tensor, std::allocator<at::Tensor> >&&, std::array<bool, 1ul>, long&, c10::ScalarType&, at::Tensor&), Functions.cpp:0
                    at::_ops::_log_softmax_backward_data::call(at::Tensor const&, at::Tensor const&, long, c10::ScalarType), ??:0
                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), &torch::autograd::VariableType::(anonymous namespace)::_log_softmax_backward_data>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), VariableType_1.cpp:0
                        torch::autograd::VariableType::(anonymous namespace)::_log_softmax_backward_data(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), VariableType_1.cpp:0
                          at::_ops::_log_softmax_backward_data::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), ??:0
                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, long, c10::ScalarType), &at::(anonymous namespace)::wrapper_CUDA__log_softmax_backward_data>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, long, c10::ScalarType> >, at::Tensor (at::Tensor const&, at::Tensor const&, long, c10::ScalarType)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), RegisterCUDA_0.cpp:0
                              at::meta::structured__log_softmax_backward_data::meta(at::Tensor const&, at::Tensor const&, long, c10::ScalarType), ??:0
                                at::(anonymous namespace)::structured_log_softmax_backward_cuda_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                  at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.28125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:282
                                                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/pytorch_utils.py:121
                                                                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  torch::autograd::THPVariable_addmm(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::addmm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::VariableType::(anonymous namespace)::addmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::addmm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_addmm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::meta::structured_addmm::meta(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::structured_addmm_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.28125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:283
                                                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/activations.py:66
                                                                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  torch::autograd::THPVariable_tanh(_object*, _object*, _object*), python_torch_functions_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::tanh::call(at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::tanh>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::VariableType::(anonymous namespace)::tanh(c10::DispatchKeySet, at::Tensor const&), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::tanh::redispatch(c10::DispatchKeySet, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_tanh>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&> >, at::Tensor (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::wrapper_CUDA_tanh(at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::TensorIteratorBase::build_borrowing_unary_float_op(at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::structured_tanh_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.28125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:283
                                                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/activations.py:66
                                                                                                                                                                                                                                                                                                                                                                              PyNumber_Add, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _Py_c_pow, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                                                                                                                                    _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_add>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_add(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::add_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                torch::autograd::VariableType::(anonymous namespace)::add_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_add_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::wrapper_CUDA_add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              at::TensorIteratorBase::allocate_or_resize_outputs(), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::structured_ufunc_add_CUDA_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.2109375 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:225
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/transformers/pytorch_utils.py:121
                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_addmm(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::addmm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::VariableType::(anonymous namespace)::addmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::addmm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_addmm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::meta::structured_addmm::meta(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::structured_addmm_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            torch::unwind::unwind(), ??:0

Num of Calls: 24, Memory Usage: 0.0714111328125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:255
                                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                  sdpa_attention_forward, /usr/local/lib/python3.10/dist-packages/transformers/integrations/sdpa_attention.py:92
                                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*), python_nn_functions.cpp:0
                                                                                                                                                                                                                                                                                                                                                                          at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                at::_ops::_scaled_dot_product_efficient_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), VariableType_3.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), VariableType_3.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const, tmpxft_000076e9_00000000-6_attention.compute_120.cudafe1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        auto at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)::{lambda(auto:1, auto:2)#1}::operator()<PyTorchMemEffAttention::AttentionKernel<float, cutlass::arch::Sm80, true, 64, 64, 64, true, true>, void (*)(PyTorchMemEffAttention::AttentionKernel<float, cutlass::arch::Sm80, true, 64, 64, 64, true, true>::Params)>(PyTorchMemEffAttention::AttentionKernel<float, cutlass::arch::Sm80, true, 64, 64, 64, true, true>, void (*)(PyTorchMemEffAttention::AttentionKernel<float, cutlass::arch::Sm80, true, 64, 64, 64, true, true>::Params)) const [clone .constprop.0], tmpxft_000076e9_00000000-6_attention.compute_120.cudafe1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:318
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                  at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                      at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                        at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                            torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                        at::empty_symint(c10::ArrayRef<c10::SymInt>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:238
                                                                                                                                                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                update, /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:783
                                                                                                                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      update, /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:119
                                                                                                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                          PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                            torch::autograd::THPVariable_cat(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              at::_ops::cat::call(c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), &torch::autograd::VariableType::(anonymous namespace)::cat>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long> >, at::Tensor (c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  torch::autograd::VariableType::(anonymous namespace)::cat(c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::cat::redispatch(c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::IListRef<at::Tensor> const&, long), &at::(anonymous namespace)::wrapper_CUDA_cat>, at::Tensor, c10::guts::typelist::typelist<c10::IListRef<at::Tensor> const&, long> >, at::Tensor (c10::IListRef<at::Tensor> const&, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                        at::meta::structured_cat::meta(c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::structured_cat_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                              torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:238
                                                                                                                                                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                update, /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:783
                                                                                                                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      update, /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:120
                                                                                                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                          PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                            torch::autograd::THPVariable_cat(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              at::_ops::cat::call(c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), &torch::autograd::VariableType::(anonymous namespace)::cat>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long> >, at::Tensor (c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  torch::autograd::VariableType::(anonymous namespace)::cat(c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::cat::redispatch(c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::IListRef<at::Tensor> const&, long), &at::(anonymous namespace)::wrapper_CUDA_cat>, at::Tensor, c10::guts::typelist::typelist<c10::IListRef<at::Tensor> const&, long> >, at::Tensor (c10::IListRef<at::Tensor> const&, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                        at::meta::structured_cat::meta(c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::structured_cat_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                              torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:329
                                                                                                                                                                                                                                                                                                                              PyNumber_Add, ??:0
                                                                                                                                                                                                                                                                                                                                _Py_c_pow, ??:0
                                                                                                                                                                                                                                                                                                                                  PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                                                                                        _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_add>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_add(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                            at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::add_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                torch::autograd::VariableType::(anonymous namespace)::add_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                  at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_add_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::wrapper_CUDA_add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                        at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                            at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                              at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::structured_ufunc_add_CUDA_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:352
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                  at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                      at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                        at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                            torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                        at::empty_symint(c10::ArrayRef<c10::SymInt>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:355
                                                                                                                                                                                                                                                                                                                              PyNumber_Add, ??:0
                                                                                                                                                                                                                                                                                                                                _Py_c_pow, ??:0
                                                                                                                                                                                                                                                                                                                                  PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                                                                                        _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_add>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_add(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                            at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::add_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                torch::autograd::VariableType::(anonymous namespace)::add_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                  at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_add_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::wrapper_CUDA_add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                        at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                            at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                              at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::structured_ufunc_add_CUDA_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.017578125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:267
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/dropout.py:73
                                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        dropout, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1441
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::THPVariable_dropout(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::dropout::call(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__dropout>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, double, bool> >, at::Tensor (at::Tensor const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::native::dropout(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_dropout::call(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), &torch::autograd::VariableType::(anonymous namespace)::native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::VariableType::(anonymous namespace)::native_dropout(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::native_dropout::redispatch(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::native::native_dropout_cuda(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                  std::tuple<at::Tensor, at::Tensor> at::native::dropout_cuda<bool>(at::CUDAGeneratorImpl*, at::Tensor const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.017578125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:285
                                                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/dropout.py:73
                                                                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  dropout, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1441
                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_dropout(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::dropout::call(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__dropout>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, double, bool> >, at::Tensor (at::Tensor const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::native::dropout(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::native_dropout::call(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), &torch::autograd::VariableType::(anonymous namespace)::native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    torch::autograd::VariableType::(anonymous namespace)::native_dropout(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_dropout::redispatch(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          at::native::native_dropout_cuda(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            std::tuple<at::Tensor, at::Tensor> at::native::dropout_cuda<bool>(at::CUDAGeneratorImpl*, at::Tensor const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.005859375 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:694
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                  _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                        _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                              forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/dropout.py:73
                                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    dropout, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1441
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_dropout(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                            at::_ops::dropout::call(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__dropout>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, double, bool> >, at::Tensor (at::Tensor const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                at::native::dropout(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                  at::_ops::native_dropout::call(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), &torch::autograd::VariableType::(anonymous namespace)::native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                      torch::autograd::VariableType::(anonymous namespace)::native_dropout(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                        at::_ops::native_dropout::redispatch(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                            at::native::native_dropout_cuda(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                              std::tuple<at::Tensor, at::Tensor> at::native::dropout_cuda<bool>(at::CUDAGeneratorImpl*, at::Tensor const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                at::empty_like(at::Tensor const&, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                  at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                      at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                            at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                  at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.005859375 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:724
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                  _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                        _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                              forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                            at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                  at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                      torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                        at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                              at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                  at::empty_symint(c10::ArrayRef<c10::SymInt>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                    at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                              at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                            torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.00146484375 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:694
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                  _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                        _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                              forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/dropout.py:73
                                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    dropout, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1441
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_dropout(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                            at::_ops::dropout::call(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__dropout>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, double, bool> >, at::Tensor (at::Tensor const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                at::native::dropout(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                  at::_ops::native_dropout::call(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), &torch::autograd::VariableType::(anonymous namespace)::native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                      torch::autograd::VariableType::(anonymous namespace)::native_dropout(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                        at::_ops::native_dropout::redispatch(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                            at::native::native_dropout_cuda(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                              std::tuple<at::Tensor, at::Tensor> at::native::dropout_cuda<bool>(at::CUDAGeneratorImpl*, at::Tensor const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                    at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                      at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                          at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                  torch::unwind::unwind(), ??:0

Num of Calls: 24, Memory Usage: 0.00018310546875 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:318
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                  at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                      at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                        at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                            torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 24, Memory Usage: 0.00018310546875 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:352
                                                                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                  at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                      at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                        at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                            torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 2, Memory Usage: 3.0517578125e-05 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:710
                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                <dictcomp>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:710
                                                                                                                                                                                                                                  _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                    torch::autograd::THPVariable_to(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                      torch::autograd::dispatch_to(at::Tensor const&, c10::Device, bool, bool, std::optional<c10::MemoryFormat>), python_variable_methods.cpp:0
                                                                                                                                                                                                                                        at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd_dtype_layout_to>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                            at::native::to(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                              at::_ops::_to_copy::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), &torch::autograd::VariableType::(anonymous namespace)::_to_copy>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), VariableType_0.cpp:0
                                                                                                                                                                                                                                                  torch::autograd::VariableType::(anonymous namespace)::_to_copy(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), VariableType_0.cpp:0
                                                                                                                                                                                                                                                    at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::_to_copy>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                        at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd___to_copy>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                            at::native::_to_copy(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                              at::empty_strided(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions), :0
                                                                                                                                                                                                                                                                at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                    at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                        at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                          at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                            at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                              at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                        torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                          torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                            torch::unwind::unwind(), ??:0

Num of Calls: 2, Memory Usage: 1.52587890625e-05 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:724
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                  _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                        _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                              forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                            at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                  at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                      torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                        at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                              at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                  at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                      at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                            at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                          torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                            torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                              torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 3.814697265625e-06 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:650
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                            torch::autograd::THPVariable_arange(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                                                              at::_ops::arange_start::call(c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::arange_start>, at::Tensor, c10::guts::typelist::typelist<c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                  at::_ops::arange_start::redispatch(c10::DispatchKeySet, c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd_start_arange>, at::Tensor, c10::guts::typelist::typelist<c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                      at::native::arange(c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                        at::native::arange(c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                          at::_ops::arange_start_out::call(c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&), ??:0
                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_start_out_arange_out>, at::Tensor&, c10::guts::typelist::typelist<c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&> >, at::Tensor& (c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                              at::native::arange_cuda_out(c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&), ??:0
                                                                                                                                                                                                                                                                                                                at::native::arange_cuda_out(c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&)::{lambda()#1}::operator()() const [clone .isra.0], tmpxft_00005794_00000000-6_RangeFactories.compute_120.cudafe1.cpp:0
                                                                                                                                                                                                                                                                                                                  at::_ops::resize_::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                    at::native::resize_cuda_(at::Tensor const&, c10::ArrayRef<long>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                      at::native::resize_bytes_cuda(c10::StorageImpl*, unsigned long), ??:0
                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                              torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                  torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 3.725290298461914e-09 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:712
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:828
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                ForCausalLMLoss, /usr/local/lib/python3.10/dist-packages/transformers/loss/loss_utils.py:66
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      fixed_cross_entropy, /usr/local/lib/python3.10/dist-packages/transformers/loss/loss_utils.py:36
                                                                                                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                            cross_entropy, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:3504
                                                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                  torch::autograd::THPVariable_cross_entropy_loss(_object*, _object*, _object*), python_nn_functions.cpp:0
                                                                                                                                                                                                                                                                                    at::_ops::cross_entropy_loss::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), ??:0
                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__cross_entropy_loss>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double> >, at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                        at::native::cross_entropy_loss_symint(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), ??:0
                                                                                                                                                                                                                                                                                          at::_ops::nll_loss_nd::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), ??:0
                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__nll_loss_nd>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt> >, at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                              at::native::nll_loss_nd_symint(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), ??:0
                                                                                                                                                                                                                                                                                                at::_ops::nll_loss::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), ??:0
                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__nll_loss>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt> >, at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                    at::native::nll_loss_symint(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), ??:0
                                                                                                                                                                                                                                                                                                      at::_ops::nll_loss_forward::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), ??:0
                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), &torch::autograd::VariableType::(anonymous namespace)::nll_loss_forward>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt> >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                          torch::autograd::VariableType::(anonymous namespace)::nll_loss_forward(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                            at::_ops::nll_loss_forward::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt), ??:0
                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long), &at::(anonymous namespace)::wrapper_CUDA_nll_loss_forward>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long> >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::wrapper_CUDA_nll_loss_forward(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                  at::meta::structured_nll_loss_forward::meta(at::Tensor const&, at::Tensor const&, at::OptionalTensorRef, long, long), ??:0
                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::structured_nll_loss_forward_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                          at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                            at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                              at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                      torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                        torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                          torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 3.725290298461914e-09 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:782
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_clean, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:714
                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                backward, /usr/local/lib/python3.10/dist-packages/torch/_tensor.py:630
                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                      backward, /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:357
                                                                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                            _make_grads, /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:230
                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                  torch::autograd::THPVariable_ones_like(_object*, _object*, _object*), python_torch_functions_1.cpp:0
                                                                                                                                                                                                                                                    at::_ops::ones_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__ones_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                        at::native::ones_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                          at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                              at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                    at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                        at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                          at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                            at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                              at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                        torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                          torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                            torch::unwind::unwind(), ??:0


INFO:root:Loading snapshot snapshot_debug_baseline.pickle using io read
INFO:root:Loading snapshot snapshot_debug_baseline.pickle, size 2.49MB ...
INFO:root:Snapshot loaded successfully.
INFO:root:Total Peak Dynamic Memory Usage (Relative to Start): 4.620100028812885 GiB at 7f1b9e000000_1 (alloc) - size 411705344 bytes at 1770312411759446 us
INFO:root:Total Static Memory Usage (estimated by Pytorch visualizer): 0.49508094787597656 GiB
INFO:root:Total Overall Peak Memory Usage (Dynamic + Static): 5.115180976688862 GiB

Analyze the Buggy Snapshot#

if HAS_CUDA and HAS_MOSAIC_CLI:
    print("=" * 60)
    print("MOSAIC: Analyzing the Buggy Snapshot")
    print("=" * 60)

    result = subprocess.run(
        ["mosaic_get_memory_usage_peak", "--snapshot", "snapshot_with_bug.pickle"],
        capture_output=True,
        text=True,
    )
    print(result.stdout)
    if result.stderr:
        print(result.stderr)

============================================================
MOSAIC: Analyzing the Buggy Snapshot
============================================================

Num of Calls: 172, Memory Usage: 0.5691804885864258 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:761
                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                wrapper, /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:526
                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                      _use_grad, /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:81
                                                                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                            step, /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:238
                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                  _init_group, /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:178
                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                        torch::autograd::THPVariable_zeros_like(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                          at::_ops::zeros_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__zeros_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                              at::native::zeros_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                at::empty_like(at::Tensor const&, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                  at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                      at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                        at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                            at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                  at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                    at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                      at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                        at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 172, Memory Usage: 0.5691804885864258 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:761
                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                wrapper, /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:526
                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                      _use_grad, /usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py:81
                                                                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                            step, /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:238
                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                  _init_group, /usr/local/lib/python3.10/dist-packages/torch/optim/adam.py:182
                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                        torch::autograd::THPVariable_zeros_like(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                          at::_ops::zeros_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__zeros_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                              at::native::zeros_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                at::empty_like(at::Tensor const&, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                  at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                      at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                        at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                            at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                  at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                    at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                      at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                        at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 24, Memory Usage: 0.5625 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:283
                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/activations.py:66
                                                                                                                                                                                                                                                                                                                                                                                                            PyNumber_Multiply, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              _Py_c_pow, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                  _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_mul>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_mul(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::mul_Tensor::call(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::mul_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::VariableType::(anonymous namespace)::mul_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::mul_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_mul_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::wrapper_CUDA_mul_Tensor(at::Tensor const&, at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::allocate_or_resize_outputs(), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::structured_mul_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                              torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.38343048095703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:823
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                  _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                        _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                              forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:134
                                                                                                                                                                                                                                                                                                                _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                  PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                    torch::autograd::THPVariable_linear(_object*, _object*, _object*), python_nn_functions.cpp:0
                                                                                                                                                                                                                                                                                                                      at::_ops::linear::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&), ??:0
                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__linear>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                          at::native::linear(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&), ??:0
                                                                                                                                                                                                                                                                                                                            at::_ops::matmul::call(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__matmul>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                at::native::matmul(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                  at::native::_matmul_impl(at::Tensor&, at::Tensor const&, at::Tensor const&), LinearAlgebra.cpp:0
                                                                                                                                                                                                                                                                                                                                    at::_ops::mm::call(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::mm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_3.cpp:0
                                                                                                                                                                                                                                                                                                                                        torch::autograd::VariableType::(anonymous namespace)::mm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_3.cpp:0
                                                                                                                                                                                                                                                                                                                                          at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                              at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::structured_mm_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.38343048095703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:828
                                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                              ForCausalLMLoss, /usr/local/lib/python3.10/dist-packages/transformers/loss/loss_utils.py:66
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    fixed_cross_entropy, /usr/local/lib/python3.10/dist-packages/transformers/loss/loss_utils.py:36
                                                                                                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                          cross_entropy, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:3504
                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                              PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                torch::autograd::THPVariable_cross_entropy_loss(_object*, _object*, _object*), python_nn_functions.cpp:0
                                                                                                                                                                                                                                                                                                                  at::_ops::cross_entropy_loss::call(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), ??:0
                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__cross_entropy_loss>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double> >, at::Tensor (at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                      at::native::cross_entropy_loss_symint(at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, double), ??:0
                                                                                                                                                                                                                                                                                                                        at::_ops::log_softmax_int::call(at::Tensor const&, long, std::optional<c10::ScalarType>), ??:0
                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, long, std::optional<c10::ScalarType>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd_int_log_softmax>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, long, std::optional<c10::ScalarType> > >, at::Tensor (at::Tensor const&, long, std::optional<c10::ScalarType>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, long, std::optional<c10::ScalarType>), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                            at::native::log_softmax(at::Tensor const&, long, std::optional<c10::ScalarType>), ??:0
                                                                                                                                                                                                                                                                                                                              at::_ops::_log_softmax::call(at::Tensor const&, long, bool), ??:0
                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, long, bool), &torch::autograd::VariableType::(anonymous namespace)::_log_softmax>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, long, bool> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, long, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, long, bool), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                  torch::autograd::VariableType::(anonymous namespace)::_log_softmax(c10::DispatchKeySet, at::Tensor const&, long, bool), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                    at::_ops::_log_softmax::redispatch(c10::DispatchKeySet, at::Tensor const&, long, bool), ??:0
                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, long, bool), &at::(anonymous namespace)::wrapper_CUDA__log_softmax>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, long, bool> >, at::Tensor (at::Tensor const&, long, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, long, bool), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                        at::meta::structured__log_softmax::meta(at::Tensor const&, long, bool), ??:0
                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::structured_log_softmax_cuda_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                            torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                              torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.38343048095703125 GiB
clone, ??:0
  pthread_condattr_setpshared, ??:0
    std::error_code::default_error_condition() const, ??:0
      torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), :0
        torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), ??:0
          torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&), ??:0
            torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&), ??:0
              torch::autograd::Node::operator()(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), :0
                torch::autograd::generated::NllLossBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), ??:0
                  torch::autograd::generated::NllLossBackward0_apply_functional(std::vector<at::Tensor, std::allocator<at::Tensor> >&&, std::array<bool, 1ul>, c10::SymInt&, long&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&), Functions.cpp:0
                    at::_ops::nll_loss_backward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), ??:0
                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::nll_loss_backward>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), VariableType_3.cpp:0
                        torch::autograd::VariableType::(anonymous namespace)::nll_loss_backward(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), VariableType_3.cpp:0
                          at::_ops::nll_loss_backward::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, c10::SymInt, at::Tensor const&), ??:0
                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_nll_loss_backward>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&), RegisterCUDA_0.cpp:0
                              at::(anonymous namespace)::wrapper_CUDA_nll_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, long, long, at::Tensor const&), RegisterCUDA_0.cpp:0
                                at::meta::structured_nll_loss_backward::meta(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::OptionalTensorRef, long, long, at::Tensor const&), ??:0
                                  at::(anonymous namespace)::structured_nll_loss_backward_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                    at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                        at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                          at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                            at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                  c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                    torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                      torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                        torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.38343048095703125 GiB
clone, ??:0
  pthread_condattr_setpshared, ??:0
    std::error_code::default_error_condition() const, ??:0
      torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), :0
        torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), ??:0
          torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&), ??:0
            torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&), ??:0
              torch::autograd::Node::operator()(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), :0
                torch::autograd::generated::LogSoftmaxBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), ??:0
                  torch::autograd::generated::LogSoftmaxBackward0_apply_functional(std::vector<at::Tensor, std::allocator<at::Tensor> >&&, std::array<bool, 1ul>, long&, c10::ScalarType&, at::Tensor&), Functions.cpp:0
                    at::_ops::_log_softmax_backward_data::call(at::Tensor const&, at::Tensor const&, long, c10::ScalarType), ??:0
                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), &torch::autograd::VariableType::(anonymous namespace)::_log_softmax_backward_data>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), VariableType_1.cpp:0
                        torch::autograd::VariableType::(anonymous namespace)::_log_softmax_backward_data(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), VariableType_1.cpp:0
                          at::_ops::_log_softmax_backward_data::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), ??:0
                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, long, c10::ScalarType), &at::(anonymous namespace)::wrapper_CUDA__log_softmax_backward_data>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, long, c10::ScalarType> >, at::Tensor (at::Tensor const&, at::Tensor const&, long, c10::ScalarType)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, long, c10::ScalarType), RegisterCUDA_0.cpp:0
                              at::meta::structured__log_softmax_backward_data::meta(at::Tensor const&, at::Tensor const&, long, c10::ScalarType), ??:0
                                at::(anonymous namespace)::structured_log_softmax_backward_cuda_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                  at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.28125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:282
                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/pytorch_utils.py:121
                                                                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::autograd::THPVariable_addmm(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::addmm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                      torch::autograd::VariableType::(anonymous namespace)::addmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::addmm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_addmm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::meta::structured_addmm::meta(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::structured_addmm_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.28125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:283
                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/activations.py:66
                                                                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::autograd::THPVariable_tanh(_object*, _object*, _object*), python_torch_functions_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::tanh::call(at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::tanh>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                      torch::autograd::VariableType::(anonymous namespace)::tanh(c10::DispatchKeySet, at::Tensor const&), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::tanh::redispatch(c10::DispatchKeySet, at::Tensor const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_tanh>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&> >, at::Tensor (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::wrapper_CUDA_tanh(at::Tensor const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::TensorIteratorBase::build_borrowing_unary_float_op(at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::structured_tanh_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                      torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                        torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                          torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.28125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:283
                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/activations.py:66
                                                                                                                                                                                                                                                                                                                                                                                                            PyNumber_Add, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              _Py_c_pow, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                  _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_add>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_add(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::add_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::VariableType::(anonymous namespace)::add_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_add_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::wrapper_CUDA_add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            at::TensorIteratorBase::allocate_or_resize_outputs(), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::structured_ufunc_add_CUDA_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.2109375 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:225
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                forward, /usr/local/lib/python3.10/dist-packages/transformers/pytorch_utils.py:121
                                                                                                                                                                                                                                                                                                                                                                                                                  _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      torch::autograd::THPVariable_addmm(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::addmm::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::addmm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            torch::autograd::VariableType::(anonymous namespace)::addmm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::addmm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_addmm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::meta::structured_addmm::meta(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::structured_addmm_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                      torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                        torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                          torch::unwind::unwind(), ??:0

Num of Calls: 24, Memory Usage: 0.17578125 GiB
clone, ??:0
  pthread_condattr_setpshared, ??:0
    std::error_code::default_error_condition() const, ??:0
      torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), :0
        torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), ??:0
          torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&), ??:0
            torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&), ??:0
              torch::autograd::Node::operator()(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), :0
                torch::autograd::generated::AddmmBackward0::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&), ??:0
                  torch::autograd::generated::AddmmBackward0_apply_functional(std::vector<at::Tensor, std::allocator<at::Tensor> >&&, std::array<bool, 3ul>, c10::Scalar&, c10::Scalar&, at::Tensor&, c10::Layout&, std::vector<c10::SymInt, std::allocator<c10::SymInt> >&, std::vector<c10::SymInt, std::allocator<c10::SymInt> >&, at::Tensor&, c10::Layout&, std::vector<c10::SymInt, std::allocator<c10::SymInt> >&, std::vector<c10::SymInt, std::allocator<c10::SymInt> >&), Functions.cpp:0
                    torch::autograd::generated::details::mm_mat1_backward(at::Tensor const&, at::Tensor const&, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, c10::Layout, c10::Scalar const&), :0
                      at::_ops::mm::call(at::Tensor const&, at::Tensor const&), ??:0
                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::mm>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_3.cpp:0
                          torch::autograd::VariableType::(anonymous namespace)::mm(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), VariableType_3.cpp:0
                            at::_ops::mm::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), ??:0
                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&), &at::(anonymous namespace)::wrapper_CUDA_mm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&> >, at::Tensor (at::Tensor const&, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&), RegisterCUDA_0.cpp:0
                                at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&), ??:0
                                  at::(anonymous namespace)::structured_mm_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                    at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                        at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                          at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                            at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                  c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                    torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                      torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                        torch::unwind::unwind(), ??:0

Num of Calls: 24, Memory Usage: 0.0714111328125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:255
                                                                                                                                                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                sdpa_attention_forward, /usr/local/lib/python3.10/dist-packages/transformers/integrations/sdpa_attention.py:92
                                                                                                                                                                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*), python_nn_functions.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::_scaled_dot_product_efficient_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), VariableType_3.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), VariableType_3.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const, tmpxft_000076e9_00000000-6_attention.compute_120.cudafe1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      auto at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)::{lambda(auto:1, auto:2)#1}::operator()<PyTorchMemEffAttention::AttentionKernel<float, cutlass::arch::Sm80, true, 64, 64, 64, true, true>, void (*)(PyTorchMemEffAttention::AttentionKernel<float, cutlass::arch::Sm80, true, 64, 64, 64, true, true>::Params)>(PyTorchMemEffAttention::AttentionKernel<float, cutlass::arch::Sm80, true, 64, 64, 64, true, true>, void (*)(PyTorchMemEffAttention::AttentionKernel<float, cutlass::arch::Sm80, true, 64, 64, 64, true, true>::Params)) const [clone .constprop.0], tmpxft_000076e9_00000000-6_attention.compute_120.cudafe1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                          at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:318
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::empty_symint(c10::ArrayRef<c10::SymInt>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:238
                                                                                                                                                                                                                                                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              update, /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:783
                                                                                                                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    update, /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:119
                                                                                                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_cat(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::cat::call(c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), &torch::autograd::VariableType::(anonymous namespace)::cat>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long> >, at::Tensor (c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::autograd::VariableType::(anonymous namespace)::cat(c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::cat::redispatch(c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::IListRef<at::Tensor> const&, long), &at::(anonymous namespace)::wrapper_CUDA_cat>, at::Tensor, c10::guts::typelist::typelist<c10::IListRef<at::Tensor> const&, long> >, at::Tensor (c10::IListRef<at::Tensor> const&, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::meta::structured_cat::meta(c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::(anonymous namespace)::structured_cat_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:238
                                                                                                                                                                                                                                                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              update, /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:783
                                                                                                                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    update, /usr/local/lib/python3.10/dist-packages/transformers/cache_utils.py:120
                                                                                                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_cat(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::cat::call(c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), &torch::autograd::VariableType::(anonymous namespace)::cat>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long> >, at::Tensor (c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::autograd::VariableType::(anonymous namespace)::cat(c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), VariableType_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::cat::redispatch(c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::IListRef<at::Tensor> const&, long), &at::(anonymous namespace)::wrapper_CUDA_cat>, at::Tensor, c10::guts::typelist::typelist<c10::IListRef<at::Tensor> const&, long> >, at::Tensor (c10::IListRef<at::Tensor> const&, long)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::IListRef<at::Tensor> const&, long), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::meta::structured_cat::meta(c10::IListRef<at::Tensor> const&, long), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::(anonymous namespace)::structured_cat_out_cuda_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:329
                                                                                                                                                                                                                                                                                                                                                            PyNumber_Add, ??:0
                                                                                                                                                                                                                                                                                                                                                              _Py_c_pow, ??:0
                                                                                                                                                                                                                                                                                                                                                                PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_add>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_add(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                          at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::add_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::VariableType::(anonymous namespace)::add_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_add_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::wrapper_CUDA_add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                      at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                        at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::structured_ufunc_add_CUDA_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:352
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::empty_symint(c10::ArrayRef<c10::SymInt>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0703125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:355
                                                                                                                                                                                                                                                                                                                                                            PyNumber_Add, ??:0
                                                                                                                                                                                                                                                                                                                                                              _Py_c_pow, ??:0
                                                                                                                                                                                                                                                                                                                                                                PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_add>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_add(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                                                                                                                          at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::add_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::VariableType::(anonymous namespace)::add_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_add_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::(anonymous namespace)::wrapper_CUDA_add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                      at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                        at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::structured_ufunc_add_CUDA_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.017578125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:319
                                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:267
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/dropout.py:73
                                                                                                                                                                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      dropout, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1441
                                                                                                                                                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            torch::autograd::THPVariable_dropout(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::dropout::call(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__dropout>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, double, bool> >, at::Tensor (at::Tensor const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::dropout(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::native_dropout::call(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), &torch::autograd::VariableType::(anonymous namespace)::native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        torch::autograd::VariableType::(anonymous namespace)::native_dropout(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::native_dropout::redispatch(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              at::native::native_dropout_cuda(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                std::tuple<at::Tensor, at::Tensor> at::native::dropout_cuda<bool>(at::CUDAGeneratorImpl*, at::Tensor const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                      at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.017578125 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:353
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:285
                                                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/dropout.py:73
                                                                                                                                                                                                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                dropout, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1441
                                                                                                                                                                                                                                                                                                                                                                                                                  _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                      torch::autograd::THPVariable_dropout(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::dropout::call(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__dropout>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, double, bool> >, at::Tensor (at::Tensor const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                            at::native::dropout(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::_ops::native_dropout::call(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), &torch::autograd::VariableType::(anonymous namespace)::native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::autograd::VariableType::(anonymous namespace)::native_dropout(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::native_dropout::redispatch(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        at::native::native_dropout_cuda(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                          std::tuple<at::Tensor, at::Tensor> at::native::dropout_cuda<bool>(at::CUDAGeneratorImpl*, at::Tensor const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                            at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                          torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                            torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                              torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.005859375 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:694
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/dropout.py:73
                                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  dropout, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1441
                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_dropout(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                          at::_ops::dropout::call(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__dropout>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, double, bool> >, at::Tensor (at::Tensor const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                              at::native::dropout(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                at::_ops::native_dropout::call(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), &torch::autograd::VariableType::(anonymous namespace)::native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                    torch::autograd::VariableType::(anonymous namespace)::native_dropout(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_dropout::redispatch(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                          at::native::native_dropout_cuda(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                            std::tuple<at::Tensor, at::Tensor> at::native::dropout_cuda<bool>(at::CUDAGeneratorImpl*, at::Tensor const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                              at::empty_like(at::Tensor const&, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                              torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                  torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.005859375 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:724
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                          at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                              at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                    torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                            at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                              at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                at::empty_symint(c10::ArrayRef<c10::SymInt>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                  at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                    c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 0.00146484375 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:694
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/dropout.py:73
                                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  dropout, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1441
                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_dropout(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                          at::_ops::dropout::call(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__dropout>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, double, bool> >, at::Tensor (at::Tensor const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                              at::native::dropout(at::Tensor const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                at::_ops::native_dropout::call(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), &torch::autograd::VariableType::(anonymous namespace)::native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                    torch::autograd::VariableType::(anonymous namespace)::native_dropout(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_dropout::redispatch(c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_dropout>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, double, std::optional<bool> > >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&, double, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, double, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                          at::native::native_dropout_cuda(at::Tensor const&, double, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                            std::tuple<at::Tensor, at::Tensor> at::native::dropout_cuda<bool>(at::CUDAGeneratorImpl*, at::Tensor const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                              at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                  at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                        at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                              at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                              torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                torch::unwind::unwind(), ??:0

Num of Calls: 24, Memory Usage: 0.00018310546875 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:318
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::unwind::unwind(), ??:0

Num of Calls: 24, Memory Usage: 0.00018310546875 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:705
                                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                  __call__, /usr/local/lib/python3.10/dist-packages/transformers/modeling_layers.py:93
                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                          _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                            PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                    PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                          forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:352
                                                                                                                                                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                  forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                                                        layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                                                          _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                                                            PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                                                              torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                    at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                            at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                  at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                    at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                                                      at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                          at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                      at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                              torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                                                                  torch::unwind::unwind(), ??:0

Num of Calls: 12, Memory Usage: 0.0001373291015625 GiB
clone, ??:0
  pthread_condattr_setpshared, ??:0
    std::error_code::default_error_condition() const, ??:0
      torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), :0
        torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool), ??:0
          torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&), ??:0
            torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&), ??:0
              void torch::autograd::validate_outputs_impl<torch::autograd::Edge>(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> >&, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&), engine.cpp:0
                torch::autograd::InputMetadata::maybe_reduce(unsigned long, at::Tensor, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&) const, ??:0
                  torch::autograd::InputMetadata::reduce_grad(at::Tensor&) const, ??:0
                    at::Tensor at::_sum_to<c10::SymInt>(at::Tensor, c10::ArrayRef<c10::SymInt>, bool), :0
                      at::_ops::sum_dim_IntList::call(at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), ??:0
                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), &torch::autograd::VariableType::(anonymous namespace)::sum_dim_IntList>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType> > >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), VariableType_2.cpp:0
                          torch::autograd::VariableType::(anonymous namespace)::sum_dim_IntList(c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), VariableType_2.cpp:0
                            at::_ops::sum_dim_IntList::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), ??:0
                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), &at::(anonymous namespace)::wrapper_CUDA_sum_dim_IntList>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType> > >, at::Tensor (at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::OptionalArrayRef<long>, bool, std::optional<c10::ScalarType>), RegisterCUDA_0.cpp:0
                                at::meta::resize_reduction(at::impl::MetaBase&, at::Tensor const&, c10::OptionalArrayRef<long>, bool, c10::ScalarType, bool), :0
                                  at::(anonymous namespace)::structured_sum_out_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                    at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                      at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                        at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                          at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                            at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                              c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                  c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                    torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                      torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                        torch::unwind::unwind(), ??:0

Num of Calls: 2, Memory Usage: 3.0517578125e-05 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:756
                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                <dictcomp>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:756
                                                                                                                                                                                                                                  _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                    torch::autograd::THPVariable_to(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                      torch::autograd::dispatch_to(at::Tensor const&, c10::Device, bool, bool, std::optional<c10::MemoryFormat>), python_variable_methods.cpp:0
                                                                                                                                                                                                                                        at::_ops::to_dtype_layout::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd_dtype_layout_to>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                            at::native::to(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                              at::_ops::_to_copy::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), &torch::autograd::VariableType::(anonymous namespace)::_to_copy>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), VariableType_0.cpp:0
                                                                                                                                                                                                                                                  torch::autograd::VariableType::(anonymous namespace)::_to_copy(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), VariableType_0.cpp:0
                                                                                                                                                                                                                                                    at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::_to_copy>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                        at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd___to_copy>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                            at::native::_to_copy(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, bool, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                              at::empty_strided(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions), :0
                                                                                                                                                                                                                                                                at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                    at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                        at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                          at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                            at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                              at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                        torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                          torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                            torch::unwind::unwind(), ??:0

Num of Calls: 2, Memory Usage: 1.52587890625e-05 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:724
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                                                          _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                                                            _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                      _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                            forward, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/normalization.py:229
                                                                                                                                                                                                                                                                                                                                              _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                                                  layer_norm, /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:2935
                                                                                                                                                                                                                                                                                                                                                    _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                                                      PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                                                        torch::autograd::THPVariable_layer_norm(_object*, _object*, _object*), python_torch_functions_2.cpp:0
                                                                                                                                                                                                                                                                                                                                                          at::_ops::layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__layer_norm>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool> >, at::Tensor (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), RegisterCompositeImplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                              at::native::layer_norm_symint(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                at::_ops::native_layer_norm::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &torch::autograd::VariableType::(anonymous namespace)::native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                    torch::autograd::VariableType::(anonymous namespace)::native_layer_norm(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), VariableType_1.cpp:0
                                                                                                                                                                                                                                                                                                                                                                      at::_ops::native_layer_norm::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                        c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm>, std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double> >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                          at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__native_layer_norm(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                            at::native::layer_norm_cuda(at::Tensor const&, c10::ArrayRef<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, double), ??:0
                                                                                                                                                                                                                                                                                                                                                                              at::empty(c10::ArrayRef<long>, c10::TensorOptions, std::optional<c10::MemoryFormat>), :0
                                                                                                                                                                                                                                                                                                                                                                                at::_ops::empty_memory_format::call(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::empty_memory_format>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                    at::_ops::empty_memory_format::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                        at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_memory_format_empty(c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                          at::native::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                            at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                              at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                        torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                                                                          torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                                                            torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 3.814697265625e-06 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:655
                                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                                            PyInit__datetime, ??:0
                                                                                                                                                                                                                                                              _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                      _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                        PyObject_Call, ??:0
                                                                                                                                                                                                                                                                          PyMethod_New, ??:0
                                                                                                                                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                              _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                  PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                      forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:805
                                                                                                                                                                                                                                                                                        _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                          PyInit__datetime, ??:0
                                                                                                                                                                                                                                                                                            _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                                                                              _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                    _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                                                                                      PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                        PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                            _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                                                                              PyObject_Call, ??:0
                                                                                                                                                                                                                                                                                                                PyMethod_New, ??:0
                                                                                                                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                                                                                    forward, /usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py:650
                                                                                                                                                                                                                                                                                                                      _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                                                                                        PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                                                                                          torch::autograd::THPVariable_arange(_object*, _object*, _object*), python_torch_functions_0.cpp:0
                                                                                                                                                                                                                                                                                                                            at::_ops::arange_start::call(c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                              c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::arange_start>, at::Tensor, c10::guts::typelist::typelist<c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                                                                                at::_ops::arange_start::redispatch(c10::DispatchKeySet, c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd_start_arange>, at::Tensor, c10::guts::typelist::typelist<c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                                                                                                    at::native::arange(c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                      at::native::arange(c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                                                                                        at::_ops::arange_start_out::call(c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&), ??:0
                                                                                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_start_out_arange_out>, at::Tensor&, c10::guts::typelist::typelist<c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&> >, at::Tensor& (c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                                                            at::native::arange_cuda_out(c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&), ??:0
                                                                                                                                                                                                                                                                                                                                              at::native::arange_cuda_out(c10::Scalar const&, c10::Scalar const&, c10::Scalar const&, at::Tensor&)::{lambda()#1}::operator()() const [clone .isra.0], tmpxft_00005794_00000000-6_RangeFactories.compute_120.cudafe1.cpp:0
                                                                                                                                                                                                                                                                                                                                                at::_ops::resize_::call(at::Tensor const&, c10::ArrayRef<c10::SymInt>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                  at::native::resize_cuda_(at::Tensor const&, c10::ArrayRef<long>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                                                                    at::native::resize_bytes_cuda(c10::StorageImpl*, unsigned long), ??:0
                                                                                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                                                                            torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                                                              torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                                                                torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 3.725290298461914e-09 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:758
                                                                                                                                                                                                                            _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                              PyInit__datetime, ??:0
                                                                                                                                                                                                                                _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                        _wrapped_call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1776
                                                                                                                                                                                                                                          PyObject_Call, ??:0
                                                                                                                                                                                                                                            PyMethod_New, ??:0
                                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                _call_impl, /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1787
                                                                                                                                                                                                                                                  PyObject_Call, ??:0
                                                                                                                                                                                                                                                    PyMethod_New, ??:0
                                                                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                                        forward, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:674
                                                                                                                                                                                                                                                          PyNumber_Add, ??:0
                                                                                                                                                                                                                                                            _Py_c_pow, ??:0
                                                                                                                                                                                                                                                              PyThread_start_new_thread, ??:0
                                                                                                                                                                                                                                                                _PyType_LookupId, ??:0
                                                                                                                                                                                                                                                                  _PyObject_GetDictPtr, ??:0
                                                                                                                                                                                                                                                                    _object* torch::autograd::TypeError_to_NotImplemented_<&torch::autograd::THPVariable_add>(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                      torch::autograd::THPVariable_add(_object*, _object*, _object*), python_variable_methods.cpp:0
                                                                                                                                                                                                                                                                        at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                          c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), &torch::autograd::VariableType::(anonymous namespace)::add_Tensor>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                            torch::autograd::VariableType::(anonymous namespace)::add_Tensor(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), VariableType_2.cpp:0
                                                                                                                                                                                                                                                                              at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&), &at::(anonymous namespace)::wrapper_CUDA_add_Tensor>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, c10::Scalar const&> >, at::Tensor (at::Tensor const&, at::Tensor const&, c10::Scalar const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                  at::(anonymous namespace)::wrapper_CUDA_add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                    at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&), ??:0
                                                                                                                                                                                                                                                                                      at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&), ??:0
                                                                                                                                                                                                                                                                                        at::TensorIteratorBase::build(at::TensorIteratorConfig&), ??:0
                                                                                                                                                                                                                                                                                          at::TensorIteratorBase::fast_set_up(at::TensorIteratorConfig const&), ??:0
                                                                                                                                                                                                                                                                                            at::(anonymous namespace)::structured_ufunc_add_CUDA_functional::set_output_raw_strided(long, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions, c10::ArrayRef<at::Dimname>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                              at::(anonymous namespace)::create_out(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::TensorOptions const&), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                                                at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&), ??:0
                                                                                                                                                                                                                                                                                                  at::detail::empty_cuda(c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                    at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                      at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                                                        c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                                          c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                                            c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                                              torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                                                torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                                                  torch::unwind::unwind(), ??:0

Num of Calls: 1, Memory Usage: 3.725290298461914e-09 GiB
_start, ??:0
  __libc_start_main, ??:0
    __libc_init_first, ??:0
      Py_BytesMain, ??:0
        Py_RunMain, ??:0
          _PyRun_AnyFileObject, ??:0
            _PyRun_SimpleFileObject, ??:0
              PyUnicode_Tailmatch, ??:0
                PyInit__collections, ??:0
                  PyUnicode_Tailmatch, ??:0
                    PyEval_EvalCode, ??:0
                      PyEval_EvalCode, ??:0
                        _PyEval_EvalFrameDefault, ??:0
                          <module>, /usr/local/bin/sphinx-build:7
                            _PyFunction_Vectorcall, ??:0
                              _PyEval_EvalFrameDefault, ??:0
                                main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:339
                                  _PyFunction_Vectorcall, ??:0
                                    _PyEval_EvalFrameDefault, ??:0
                                      make_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:213
                                        _PyFunction_Vectorcall, ??:0
                                          _PyEval_EvalFrameDefault, ??:0
                                            run_make_mode, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:181
                                              _PyFunction_Vectorcall, ??:0
                                                _PyEval_EvalFrameDefault, ??:0
                                                  run_generic_build, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/make_mode.py:169
                                                    _PyFunction_Vectorcall, ??:0
                                                      _PyEval_EvalFrameDefault, ??:0
                                                        build_main, /usr/local/lib/python3.10/dist-packages/sphinx/cmd/build.py:293
                                                          _PyObject_MakeTpCall, ??:0
                                                            _PyStack_AsDict, ??:0
                                                              _PyObject_FastCallDictTstate, ??:0
                                                                _PyEval_EvalFrameDefault, ??:0
                                                                  __init__, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:272
                                                                    _PyFunction_Vectorcall, ??:0
                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                        _init_builder, /usr/local/lib/python3.10/dist-packages/sphinx/application.py:343
                                                                          _PyFunction_Vectorcall, ??:0
                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                              emit, /usr/local/lib/python3.10/dist-packages/sphinx/events.py:97
                                                                                _PyFunction_Vectorcall, ??:0
                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                    generate_gallery_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_gallery.py:757
                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                          generate_dir_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:606
                                                                                            PyUnicode_Decode, ??:0
                                                                                              _PyLong_FromByteArray, ??:0
                                                                                                PyObject_SelfIter, ??:0
                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                    <genexpr>, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:607
                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                          wrapper, /var/lib/workspace/conf.py:85
                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                start, /usr/lib/python3.10/multiprocessing/process.py:121
                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                      _Popen, /usr/lib/python3.10/multiprocessing/context.py:224
                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                            _Popen, /usr/lib/python3.10/multiprocessing/context.py:281
                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                _PyStack_AsDict, ??:0
                                                                                                                                  _PyObject_FastCallDictTstate, ??:0
                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                      __init__, /usr/lib/python3.10/multiprocessing/popen_fork.py:19
                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                            _launch, /usr/lib/python3.10/multiprocessing/popen_fork.py:71
                                                                                                                                              PyMethod_New, ??:0
                                                                                                                                                _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                  _bootstrap, /usr/lib/python3.10/multiprocessing/process.py:314
                                                                                                                                                    _PyFunction_Vectorcall, ??:0
                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                        run, /usr/lib/python3.10/multiprocessing/process.py:108
                                                                                                                                                          _PyFunction_Vectorcall, ??:0
                                                                                                                                                            _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                              call_fn, /var/lib/workspace/conf.py:73
                                                                                                                                                                _PyFunction_Vectorcall, ??:0
                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                    generate_file_rst, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1374
                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                          execute_script, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1192
                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                execute_code_block, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1048
                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                      _exec_and_get_memory, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:876
                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                            _sg_call_memory_noop, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:1725
                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                PyInit__datetime, ??:0
                                                                                                                                                                                                  _PyObject_Call_Prepend, ??:0
                                                                                                                                                                                                    _PyObject_FastCallDictTstate, ??:0
                                                                                                                                                                                                      _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                        __call__, /usr/local/lib/python3.10/dist-packages/sphinx_gallery/gen_rst.py:794
                                                                                                                                                                                                          PyCell_New, ??:0
                                                                                                                                                                                                            PyFrozenSet_New, ??:0
                                                                                                                                                                                                              PyEval_EvalCode, ??:0
                                                                                                                                                                                                                PyEval_EvalCode, ??:0
                                                                                                                                                                                                                  _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                    <module>, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:795
                                                                                                                                                                                                                      _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                        _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                          run_training_with_bug, /var/lib/workspace/beginner_source/mosaic_memory_profiling_tutorial.py:760
                                                                                                                                                                                                                            _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                              _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                backward, /usr/local/lib/python3.10/dist-packages/torch/_tensor.py:630
                                                                                                                                                                                                                                  _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                    _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                      backward, /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:357
                                                                                                                                                                                                                                        _PyFunction_Vectorcall, ??:0
                                                                                                                                                                                                                                          _PyEval_EvalFrameDefault, ??:0
                                                                                                                                                                                                                                            _make_grads, /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:230
                                                                                                                                                                                                                                              _PyObject_MakeTpCall, ??:0
                                                                                                                                                                                                                                                PyObject_CallFunctionObjArgs, ??:0
                                                                                                                                                                                                                                                  torch::autograd::THPVariable_ones_like(_object*, _object*, _object*), python_torch_functions_1.cpp:0
                                                                                                                                                                                                                                                    at::_ops::ones_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__ones_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                        at::native::ones_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                          at::_ops::empty_like::call(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                            c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__empty_like>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat> > >, at::Tensor (at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), RegisterCompositeExplicitAutograd_0.cpp:0
                                                                                                                                                                                                                                                              at::native::empty_like(at::Tensor const&, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>, std::optional<c10::MemoryFormat>), ??:0
                                                                                                                                                                                                                                                                at::_ops::empty_strided::call(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                  c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterBackendSelect.cpp:0
                                                                                                                                                                                                                                                                    at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                      c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided>, at::Tensor, c10::guts::typelist::typelist<c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool> > >, at::Tensor (c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                        at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA__empty_strided(c10::ArrayRef<c10::SymInt>, c10::ArrayRef<c10::SymInt>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), RegisterCUDA_0.cpp:0
                                                                                                                                                                                                                                                                          at::native::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                            at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, std::optional<c10::ScalarType>, std::optional<c10::Layout>, std::optional<c10::Device>, std::optional<bool>), ??:0
                                                                                                                                                                                                                                                                              at::detail::empty_strided_cuda(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ScalarType, std::optional<c10::Device>), ??:0
                                                                                                                                                                                                                                                                                at::detail::empty_strided_generic(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType), ??:0
                                                                                                                                                                                                                                                                                  c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::allocate(unsigned long), :0
                                                                                                                                                                                                                                                                                    c10::cuda::CUDACachingAllocator::Native::NativeCachingAllocator::malloc(void**, signed char, unsigned long, CUstream_st*), :0
                                                                                                                                                                                                                                                                                      c10::cuda::CUDACachingAllocator::Native::DeviceCachingAllocator::malloc(unsigned long, CUstream_st*), CUDACachingAllocator.cpp:0
                                                                                                                                                                                                                                                                                        torch::cuda::(anonymous namespace)::gather_with_cpp(), memory_snapshot.cpp:0
                                                                                                                                                                                                                                                                                          torch::CapturedTraceback::gather(bool, bool, bool), ??:0
                                                                                                                                                                                                                                                                                            torch::unwind::unwind(), ??:0


INFO:root:Loading snapshot snapshot_with_bug.pickle using io read
INFO:root:Loading snapshot snapshot_with_bug.pickle, size 3.39MB ...
INFO:root:Snapshot loaded successfully.
INFO:root:Total Peak Dynamic Memory Usage (Relative to Start): 5.007230766117573 GiB at 7f1b9e000000_15 (alloc) - size 411705344 bytes at 1770312413259417 us
INFO:root:Total Static Memory Usage (estimated by Pytorch visualizer): 0.6006870269775391 GiB
INFO:root:Total Overall Peak Memory Usage (Dynamic + Static): 5.607917793095112 GiB

Analyzing The Mosaic Output#

When you run Mosaic’s peak memory analysis, it shows stack traces for each memory allocation. Let’s look at how to find abandoned or unnecessary code that’s bloating the memory.

1. Optimizer State Allocations Delta

In the buggy snapshot output, we can see that the first two stack traces represent the optimizer state allocations (like zeros_like for Adam optimizer state). See torch/optim/adam.py in the stack trace.

In the snapshot of the buggy model we can see around a total of 0.21 GB more memory:

Optimizer State Comparison#
Version	Stack Trace Position	Calls	Memory (per trace)
Buggy model	1st and 2nd	172 calls	0.569 GB + 0.569 GB
Baseline	2nd and 3rd	148 calls	0.464 GB + 0.464 GB

What this tells us: The optimizer is tracking more tensors! This is your first clue that there are extra parameters or tensors in the computation graph.

2. Additional Activation Allocations

The buggy version shows extra allocations that don’t appear in the baseline model. Scrolling down the Mosaic output of the buggy model we can see additional stack traces which contain:

torch::autograd::Engine::evaluate_function: We’re in the backward pass
AddmmBackward0::apply: Computing gradients for an addmm operation
empty_cuda at the bottom: Allocating a new CUDA tensor to store the gradient

0.176 GB from matrix multiply gradients (AddmmBackward0, mm_mat1_backward)

Memory Total Explanation#

Total Peak Dynamic Memory Usage: This is the peak memory that changes during execution, measured relative to the starting point of the snapshot. It tracks memory allocations that occur during the traced execution timeline.

Total Static Memory Usage: This is the “starting memory” or baseline memory that exists before tracing begins. It’s estimated by the PyTorch visualizer and remains constant throughout the snapshot (doesn’t come with stack traces).

Note

In the snapshots you may observe differences in total static memory usage, which accounts for the remaining difference.

Total Overall Peak Memory Usage: Dynamic + Static

if HAS_CUDA:
    print("\n" + "=" * 60)
    print("COMPARISON")
    print("=" * 60)
    print(f"Baseline (clean model):           {baseline_memory_debug:.2f} GB")
    print(f"With bug (debug projections):     {buggy_memory:.2f} GB")
    print(
        f"Extra memory from bug:            {buggy_memory - baseline_memory_debug:.2f} GB"
    )

============================================================
COMPARISON
============================================================
Baseline (clean model):           5.13 GB
With bug (debug projections):     5.62 GB
Extra memory from bug:            0.50 GB

Case 3: Integrating Memory Analysis into Your Training Pipeline#

This section demonstrates how to use Mosaic to automatically capture memory snapshots during training, get structured memory breakdown data for monitoring/dashboards, and build automated memory monitoring for large-scale training using Mosaic programmatically (as a Python dependency).

Mosaic integrates memory analysis directly into your training pipeline.

Training with Automatic Memory Capture#

def run_training_with_memory_capture(
    batch_size=4,
    seq_length=512,
    num_steps=5,
    snapshot_path="training_snapshot.pickle",
):
    """Run training and automatically capture memory snapshot."""
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    device = torch.device("cuda")
    model = GPT2LMHeadModel.from_pretrained("gpt2").to(device)
    model.train()

    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    dataset = RandomTokenDataset(tokenizer.vocab_size, seq_length)
    dataloader = DataLoader(dataset, batch_size=batch_size)
    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)

    print(f"Running {num_steps} training steps with memory capture...")

    with capture_memory_snapshot(snapshot_path):
        for step, batch in enumerate(dataloader):
            if step >= num_steps:
                break
            batch = {k: v.to(device) for k, v in batch.items()}
            optimizer.zero_grad()
            outputs = model(input_ids=batch["input_ids"], labels=batch["labels"])
            outputs.loss.backward()
            optimizer.step()
            print(f"  Step {step + 1}/{num_steps}, Loss: {outputs.loss.item():.4f}")

    peak_memory_gb = torch.cuda.max_memory_allocated() / 1024**3
    print(f"✓ PyTorch reported peak memory: {peak_memory_gb:.3f} GB")

    del model, optimizer
    torch.cuda.empty_cache()

    return snapshot_path


if HAS_CUDA:
    print("\n" + "=" * 60)
    print("CASE 3: Pipeline Integration")
    print("=" * 60)

    pipeline_snapshot_path = run_training_with_memory_capture(batch_size=4, seq_length=512)

============================================================
CASE 3: Pipeline Integration
============================================================

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]
Loading weights:   1%|          | 1/148 [00:00<00:00, 41943.04it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|          | 1/148 [00:00<00:00, 1332.37it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 1436.16it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 1234.16it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1510.37it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1107.16it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1178.34it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1072.16it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 1213.70it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 1163.40it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 1300.09it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 1252.97it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 1275.31it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 1174.50it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 1253.06it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 1186.42it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 1151.37it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 1062.27it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 1019.00it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 961.29it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 945.79it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 885.59it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 893.40it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 855.95it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 868.03it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 828.40it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 854.39it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 803.33it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 843.28it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 834.17it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 852.96it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 824.45it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 860.32it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 852.13it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 868.04it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 859.26it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 854.09it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 822.95it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 835.62it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 818.08it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 803.98it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 796.92it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 820.05it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 806.01it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 798.94it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 774.79it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 782.46it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 761.23it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 776.48it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 760.95it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 771.81it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 765.91it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 782.62it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 771.31it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 791.16it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 782.68it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 803.43it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 779.96it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 781.77it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 774.04it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 778.92it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 773.60it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 774.13it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 764.16it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 775.53it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 754.95it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 766.26it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 736.29it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 749.95it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 746.99it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 762.91it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 751.14it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 759.11it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 742.41it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 746.51it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 735.66it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 743.54it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 734.23it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 747.98it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 735.31it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 741.61it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 737.92it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 752.27it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 750.11it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 764.88it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 762.74it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 777.30it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 775.01it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 789.40it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 786.97it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 801.04it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 798.63it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 812.62it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 810.20it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 823.99it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 821.33it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 835.01it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 832.55it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 846.11it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 843.66it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 857.10it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 854.61it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 867.91it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 865.45it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 878.56it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 876.06it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 888.97it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 886.39it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 899.27it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 896.72it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 909.17it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 906.59it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 919.18it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 916.60it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 928.82it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 926.24it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 938.33it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 935.90it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 948.17it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 945.70it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 958.10it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 955.57it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 967.71it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 965.28it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 977.37it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 974.87it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 986.91it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 984.37it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 996.22it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 993.76it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1005.58it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1003.10it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1014.88it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1012.29it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1023.80it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1021.22it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1032.81it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1030.24it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1041.62it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1039.01it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1050.37it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1047.78it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1059.03it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1056.42it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1067.60it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1064.97it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1076.02it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1073.40it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1084.35it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1081.71it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1092.58it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1089.97it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1100.78it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1098.17it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1108.77it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1106.08it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1116.78it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1114.11it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1124.68it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1122.05it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1132.60it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1129.66it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1139.87it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1137.34it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1147.33it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1144.76it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1154.35it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1151.78it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1161.67it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1158.87it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1168.29it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1165.70it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1175.09it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1172.51it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1181.71it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1179.12it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1188.90it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1186.24it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1195.68it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1193.09it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1202.29it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1199.65it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1208.92it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1206.28it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1215.51it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1212.94it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1222.00it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1219.44it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1228.47it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1225.91it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1234.82it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1232.28it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1241.15it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1238.42it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1247.14it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1244.56it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1253.26it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1250.66it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1259.21it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1256.64it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1265.24it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1262.67it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1271.29it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1268.74it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1277.29it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1274.73it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1283.17it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1280.64it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1289.05it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1285.99it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1294.50it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1292.09it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1301.05it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1298.45it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1307.46it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1305.10it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1313.21it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1310.72it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1318.78it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1316.32it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1324.43it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1321.93it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1330.19it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1327.70it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1335.83it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1333.16it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1340.98it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1338.29it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1346.41it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1343.94it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1352.01it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1349.55it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1357.52it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1354.84it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1362.76it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1360.30it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1368.11it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1365.61it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1373.19it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1370.73it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1378.55it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1376.11it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1383.60it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1380.94it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1388.64it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1386.16it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1393.48it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1390.98it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1398.32it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1395.84it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1403.18it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1400.72it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1408.01it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1405.23it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1412.48it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1409.83it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1417.27it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1414.85it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1421.98it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1418.92it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1426.04it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1423.54it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1430.52it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1428.05it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1435.01it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1432.54it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1439.50it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1437.04it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1443.89it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1441.40it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1448.31it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1445.85it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1452.42it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1449.84it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1456.81it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1454.36it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1461.16it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1458.69it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1465.44it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1463.02it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1469.71it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1467.23it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1473.93it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1471.50it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1478.08it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1475.61it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1482.25it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1479.85it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1486.59it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1484.19it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1490.78it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1488.38it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1494.76it/s, Materializing param=transformer.wpe.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1492.46it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1498.94it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1496.50it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1492.67it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  |
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Running 5 training steps with memory capture...
  Step 1/5, Loss: 12.2480
  Step 2/5, Loss: 12.0843
  Step 3/5, Loss: 11.9362
  Step 4/5, Loss: 11.7782
  Step 5/5, Loss: 11.7164
✓ Memory snapshot saved to training_snapshot.pickle
✓ PyTorch reported peak memory: 5.126 GB

Mosaic Memory Analysis via Python API#

Instead of using CLI commands, we can use Mosaic’s Python API directly for programmatic integration.

if HAS_CUDA:
    print("\n" + "=" * 60)
    print("MOSAIC MEMORY ANALYSIS (via Python API)")
    print("=" * 60)

    # Load and analyze the memory snapshot
    memory_abstract = MemoryAbstract(memory_snapshot_file=pipeline_snapshot_path)
    memory_abstract.load_memory_snapshot()

    # Analyze peak memory usage
    memory_abstract.memory_snapshot.analyze_memory_snapshot(opt="memory_peak")

    # Get results
    dynamic_peak = memory_abstract.memory_snapshot.dynamic_memory_peak
    static_memory = memory_abstract.memory_snapshot.static_memory
    overall_peak = dynamic_peak + static_memory

    print(f"Peak dynamic memory: {dynamic_peak / 1024**3:.3f} GiB")
    print(f"Static memory: {static_memory / 1024**3:.3f} GiB")
    print(f"Overall peak memory: {overall_peak / 1024**3:.3f} GiB")

    print("✓ Analysis complete using Mosaic Python API")

============================================================
MOSAIC MEMORY ANALYSIS (via Python API)
============================================================
Peak dynamic memory: 4.620 GiB
Static memory: 0.495 GiB
Overall peak memory: 5.115 GiB
✓ Analysis complete using Mosaic Python API

Reusable Memory Analysis Function#

Create a reusable function for analyzing training memory snapshots.

def analyze_training_memory(snapshot_path):
    """Analyze a memory snapshot using Mosaic's Python API.

    Returns a structured dictionary with memory breakdown.

    Args:
        snapshot_path: Path to the memory snapshot pickle file.

    Returns:
        Dictionary containing memory analysis results.
    """
    # Load snapshot
    memory_abstract = MemoryAbstract(memory_snapshot_file=snapshot_path)
    memory_abstract.load_memory_snapshot()

    # Analyze peak memory
    memory_abstract.memory_snapshot.analyze_memory_snapshot(opt="memory_peak")

    # Extract results
    dynamic_peak = memory_abstract.memory_snapshot.dynamic_memory_peak
    static_memory = memory_abstract.memory_snapshot.static_memory
    overall_peak = dynamic_peak + static_memory

    return {
        "snapshot_path": snapshot_path,
        "dynamic_peak_memory_bytes": dynamic_peak,
        "static_memory_bytes": static_memory,
        "overall_peak_memory_bytes": overall_peak,
        "dynamic_peak_memory_gib": dynamic_peak / 1024**3,
        "static_memory_gib": static_memory / 1024**3,
        "overall_peak_memory_gib": overall_peak / 1024**3,
    }


if HAS_CUDA:
    analysis = analyze_training_memory(pipeline_snapshot_path)
    print("\nMemory Analysis Result:")
    for key, value in analysis.items():
        print(f"  {key}: {value}")

Memory Analysis Result:
  snapshot_path: training_snapshot.pickle
  dynamic_peak_memory_bytes: 4960794632
  static_memory_bytes: 531589120
  overall_peak_memory_bytes: 5492383752
  dynamic_peak_memory_gib: 4.620100028812885
  static_memory_gib: 0.49508094787597656
  overall_peak_memory_gib: 5.115180976688862

Complete Training Pipeline with Memory Monitoring#

This demonstrates a production-ready training pipeline with integrated Mosaic memory monitoring that can be used in CI/CD, monitoring dashboards, or capacity planning.

def training_pipeline_with_memory_monitoring(
    model_name: str,
    batch_size: int,
    seq_length: int,
    num_steps: int = 5,
    snapshot_path: str = "pipeline_snapshot.pickle",
) -> dict:
    """Complete training pipeline with integrated Mosaic memory monitoring.

    Can be integrated into CI/CD, monitoring dashboards, or capacity planning.

    Args:
        model_name: HuggingFace model name to use.
        batch_size: Training batch size.
        seq_length: Sequence length for input tokens.
        num_steps: Number of training steps.
        snapshot_path: Path to save the memory snapshot.

    Returns:
        Dictionary containing training and memory analysis report.
    """
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Setup
    print(f"Loading model: {model_name}")
    model = GPT2LMHeadModel.from_pretrained(model_name).to(device)
    model.train()
    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)

    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()

    # Training with memory capture
    print(f"Running {num_steps} training steps...")
    with capture_memory_snapshot(snapshot_path):
        for step in range(num_steps):
            input_ids = torch.randint(
                0, tokenizer.vocab_size, (batch_size, seq_length)
            ).to(device)
            outputs = model(input_ids=input_ids, labels=input_ids)
            outputs.loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            print(f"  Step {step + 1}/{num_steps}, Loss: {outputs.loss.item():.4f}")

    pytorch_peak_gb = torch.cuda.max_memory_allocated() / 1024**3

    # Mosaic analysis using Python API
    print("Analyzing memory with Mosaic...")
    memory_abstract = MemoryAbstract(memory_snapshot_file=snapshot_path)
    memory_abstract.load_memory_snapshot()
    memory_abstract.memory_snapshot.analyze_memory_snapshot(opt="memory_peak")

    dynamic_peak = memory_abstract.memory_snapshot.dynamic_memory_peak
    static_memory = memory_abstract.memory_snapshot.static_memory
    overall_peak = dynamic_peak + static_memory

    report = {
        "model": model_name,
        "config": {
            "batch_size": batch_size,
            "seq_length": seq_length,
            "num_steps": num_steps,
        },
        "pytorch_peak_memory_gb": pytorch_peak_gb,
        "mosaic_analysis": {
            "dynamic_peak_gib": dynamic_peak / 1024**3,
            "static_memory_gib": static_memory / 1024**3,
            "overall_peak_gib": overall_peak / 1024**3,
        },
        "snapshot_path": snapshot_path,
    }

    del model, optimizer
    torch.cuda.empty_cache()

    return report


# Run the pipeline
if HAS_CUDA:
    report = training_pipeline_with_memory_monitoring(
        "gpt2", batch_size=4, seq_length=512, num_steps=5
    )

    print("\n" + "=" * 60)
    print("PIPELINE REPORT")
    print("=" * 60)
    print(f"Model: {report['model']}")
    print(f"Config: {report['config']}")
    print(f"PyTorch Peak Memory: {report['pytorch_peak_memory_gb']:.3f} GB")
    print(f"Mosaic Dynamic Peak: {report['mosaic_analysis']['dynamic_peak_gib']:.3f} GiB")
    print(f"Mosaic Overall Peak: {report['mosaic_analysis']['overall_peak_gib']:.3f} GiB")

Loading model: gpt2

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]
Loading weights:   1%|          | 1/148 [00:00<00:00, 45590.26it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|          | 1/148 [00:00<00:00, 4310.69it/s, Materializing param=transformer.h.0.attn.c_attn.bias]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 1972.40it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   1%|▏         | 2/148 [00:00<00:00, 1638.08it/s, Materializing param=transformer.h.0.attn.c_attn.weight]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1892.74it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   2%|▏         | 3/148 [00:00<00:00, 1282.92it/s, Materializing param=transformer.h.0.attn.c_proj.bias]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1195.21it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 4/148 [00:00<00:00, 1129.78it/s, Materializing param=transformer.h.0.attn.c_proj.weight]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 1227.34it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   3%|▎         | 5/148 [00:00<00:00, 1168.98it/s, Materializing param=transformer.h.0.ln_1.bias]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 1247.50it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   4%|▍         | 6/148 [00:00<00:00, 1176.19it/s, Materializing param=transformer.h.0.ln_1.weight]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 1281.04it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▍         | 7/148 [00:00<00:00, 1240.81it/s, Materializing param=transformer.h.0.ln_2.bias]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 1193.68it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   5%|▌         | 8/148 [00:00<00:00, 1108.47it/s, Materializing param=transformer.h.0.ln_2.weight]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 1141.89it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   6%|▌         | 9/148 [00:00<00:00, 919.49it/s, Materializing param=transformer.h.0.mlp.c_fc.bias]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 987.73it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 10/148 [00:00<00:00, 933.31it/s, Materializing param=transformer.h.0.mlp.c_fc.weight]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 920.87it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   7%|▋         | 11/148 [00:00<00:00, 905.70it/s, Materializing param=transformer.h.0.mlp.c_proj.bias]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 960.44it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   8%|▊         | 12/148 [00:00<00:00, 946.35it/s, Materializing param=transformer.h.0.mlp.c_proj.weight]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 996.93it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 13/148 [00:00<00:00, 971.25it/s, Materializing param=transformer.h.1.attn.c_attn.bias]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 996.88it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:   9%|▉         | 14/148 [00:00<00:00, 984.02it/s, Materializing param=transformer.h.1.attn.c_attn.weight]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 1017.82it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  10%|█         | 15/148 [00:00<00:00, 986.66it/s, Materializing param=transformer.h.1.attn.c_proj.bias]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 1019.01it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█         | 16/148 [00:00<00:00, 1006.76it/s, Materializing param=transformer.h.1.attn.c_proj.weight]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 969.98it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  11%|█▏        | 17/148 [00:00<00:00, 940.04it/s, Materializing param=transformer.h.1.ln_1.bias]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 972.00it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  12%|█▏        | 18/148 [00:00<00:00, 962.47it/s, Materializing param=transformer.h.1.ln_1.weight]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 991.50it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  13%|█▎        | 19/148 [00:00<00:00, 904.00it/s, Materializing param=transformer.h.1.ln_2.bias]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 914.11it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▎        | 20/148 [00:00<00:00, 866.20it/s, Materializing param=transformer.h.1.ln_2.weight]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 870.36it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  14%|█▍        | 21/148 [00:00<00:00, 853.67it/s, Materializing param=transformer.h.1.mlp.c_fc.bias]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 852.02it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  15%|█▍        | 22/148 [00:00<00:00, 845.15it/s, Materializing param=transformer.h.1.mlp.c_fc.weight]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 872.62it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 23/148 [00:00<00:00, 865.99it/s, Materializing param=transformer.h.1.mlp.c_proj.bias]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 887.28it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  16%|█▌        | 24/148 [00:00<00:00, 832.76it/s, Materializing param=transformer.h.1.mlp.c_proj.weight]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 844.29it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  17%|█▋        | 25/148 [00:00<00:00, 823.12it/s, Materializing param=transformer.h.2.attn.c_attn.bias]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 834.49it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 26/148 [00:00<00:00, 817.69it/s, Materializing param=transformer.h.2.attn.c_attn.weight]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 829.17it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  18%|█▊        | 27/148 [00:00<00:00, 814.69it/s, Materializing param=transformer.h.2.attn.c_proj.bias]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 833.23it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  19%|█▉        | 28/148 [00:00<00:00, 828.71it/s, Materializing param=transformer.h.2.attn.c_proj.weight]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 848.40it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|█▉        | 29/148 [00:00<00:00, 834.28it/s, Materializing param=transformer.h.2.ln_1.bias]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 843.92it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  20%|██        | 30/148 [00:00<00:00, 839.52it/s, Materializing param=transformer.h.2.ln_1.weight]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 855.66it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  21%|██        | 31/148 [00:00<00:00, 851.13it/s, Materializing param=transformer.h.2.ln_2.bias]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 865.66it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 32/148 [00:00<00:00, 861.34it/s, Materializing param=transformer.h.2.ln_2.weight]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 879.74it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  22%|██▏       | 33/148 [00:00<00:00, 875.45it/s, Materializing param=transformer.h.2.mlp.c_fc.bias]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 876.78it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  23%|██▎       | 34/148 [00:00<00:00, 872.76it/s, Materializing param=transformer.h.2.mlp.c_fc.weight]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 874.99it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▎       | 35/148 [00:00<00:00, 871.03it/s, Materializing param=transformer.h.2.mlp.c_proj.bias]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 888.32it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  24%|██▍       | 36/148 [00:00<00:00, 884.28it/s, Materializing param=transformer.h.2.mlp.c_proj.weight]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 877.63it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  25%|██▌       | 37/148 [00:00<00:00, 863.90it/s, Materializing param=transformer.h.3.attn.c_attn.bias]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 880.30it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▌       | 38/148 [00:00<00:00, 876.71it/s, Materializing param=transformer.h.3.attn.c_attn.weight]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 891.73it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  26%|██▋       | 39/148 [00:00<00:00, 887.77it/s, Materializing param=transformer.h.3.attn.c_proj.bias]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 896.32it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  27%|██▋       | 40/148 [00:00<00:00, 892.49it/s, Materializing param=transformer.h.3.attn.c_proj.weight]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 898.67it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 41/148 [00:00<00:00, 892.11it/s, Materializing param=transformer.h.3.ln_1.bias]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 905.86it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  28%|██▊       | 42/148 [00:00<00:00, 902.36it/s, Materializing param=transformer.h.3.ln_1.weight]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 900.74it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  29%|██▉       | 43/148 [00:00<00:00, 897.37it/s, Materializing param=transformer.h.3.ln_2.bias]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 909.99it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|██▉       | 44/148 [00:00<00:00, 906.53it/s, Materializing param=transformer.h.3.ln_2.weight]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 920.86it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  30%|███       | 45/148 [00:00<00:00, 917.50it/s, Materializing param=transformer.h.3.mlp.c_fc.bias]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 930.76it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  31%|███       | 46/148 [00:00<00:00, 927.35it/s, Materializing param=transformer.h.3.mlp.c_fc.weight]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 930.47it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 47/148 [00:00<00:00, 927.02it/s, Materializing param=transformer.h.3.mlp.c_proj.bias]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 937.00it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  32%|███▏      | 48/148 [00:00<00:00, 929.16it/s, Materializing param=transformer.h.3.mlp.c_proj.weight]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 937.17it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  33%|███▎      | 49/148 [00:00<00:00, 933.87it/s, Materializing param=transformer.h.4.attn.c_attn.bias]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 946.49it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 50/148 [00:00<00:00, 943.27it/s, Materializing param=transformer.h.4.attn.c_attn.weight]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 953.47it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  34%|███▍      | 51/148 [00:00<00:00, 950.21it/s, Materializing param=transformer.h.4.attn.c_proj.bias]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 960.62it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  35%|███▌      | 52/148 [00:00<00:00, 957.29it/s, Materializing param=transformer.h.4.attn.c_proj.weight]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 968.78it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▌      | 53/148 [00:00<00:00, 965.63it/s, Materializing param=transformer.h.4.ln_1.bias]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 973.87it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  36%|███▋      | 54/148 [00:00<00:00, 970.35it/s, Materializing param=transformer.h.4.ln_1.weight]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 981.90it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  37%|███▋      | 55/148 [00:00<00:00, 978.50it/s, Materializing param=transformer.h.4.ln_2.bias]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 990.59it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  38%|███▊      | 56/148 [00:00<00:00, 987.18it/s, Materializing param=transformer.h.4.ln_2.weight]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 998.43it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▊      | 57/148 [00:00<00:00, 995.03it/s, Materializing param=transformer.h.4.mlp.c_fc.bias]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 1006.72it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  39%|███▉      | 58/148 [00:00<00:00, 1003.36it/s, Materializing param=transformer.h.4.mlp.c_fc.weight]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 1014.31it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  40%|███▉      | 59/148 [00:00<00:00, 1010.71it/s, Materializing param=transformer.h.4.mlp.c_proj.bias]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 1022.04it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 60/148 [00:00<00:00, 1018.68it/s, Materializing param=transformer.h.4.mlp.c_proj.weight]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 1029.30it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  41%|████      | 61/148 [00:00<00:00, 1025.95it/s, Materializing param=transformer.h.5.attn.c_attn.bias]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 1024.38it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  42%|████▏     | 62/148 [00:00<00:00, 1020.92it/s, Materializing param=transformer.h.5.attn.c_attn.weight]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 1031.78it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 63/148 [00:00<00:00, 1028.53it/s, Materializing param=transformer.h.5.attn.c_proj.bias]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 1038.19it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  43%|████▎     | 64/148 [00:00<00:00, 1034.88it/s, Materializing param=transformer.h.5.attn.c_proj.weight]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1041.23it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  44%|████▍     | 65/148 [00:00<00:00, 1037.95it/s, Materializing param=transformer.h.5.ln_1.bias]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1032.01it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▍     | 66/148 [00:00<00:00, 1020.32it/s, Materializing param=transformer.h.5.ln_1.weight]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1026.66it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  45%|████▌     | 67/148 [00:00<00:00, 1023.50it/s, Materializing param=transformer.h.5.ln_2.bias]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1033.79it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  46%|████▌     | 68/148 [00:00<00:00, 1030.77it/s, Materializing param=transformer.h.5.ln_2.weight]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1039.55it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 69/148 [00:00<00:00, 1036.51it/s, Materializing param=transformer.h.5.mlp.c_fc.bias]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1040.64it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  47%|████▋     | 70/148 [00:00<00:00, 1037.48it/s, Materializing param=transformer.h.5.mlp.c_fc.weight]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1038.44it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  48%|████▊     | 71/148 [00:00<00:00, 1032.07it/s, Materializing param=transformer.h.5.mlp.c_proj.bias]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1041.81it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▊     | 72/148 [00:00<00:00, 1039.07it/s, Materializing param=transformer.h.5.mlp.c_proj.weight]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1033.25it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  49%|████▉     | 73/148 [00:00<00:00, 1030.50it/s, Materializing param=transformer.h.6.attn.c_attn.bias]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1037.91it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  50%|█████     | 74/148 [00:00<00:00, 1033.91it/s, Materializing param=transformer.h.6.attn.c_attn.weight]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1041.71it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████     | 75/148 [00:00<00:00, 1038.97it/s, Materializing param=transformer.h.6.attn.c_proj.bias]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1048.29it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  51%|█████▏    | 76/148 [00:00<00:00, 1045.54it/s, Materializing param=transformer.h.6.attn.c_proj.weight]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1054.13it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  52%|█████▏    | 77/148 [00:00<00:00, 1051.47it/s, Materializing param=transformer.h.6.ln_1.bias]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1057.14it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 78/148 [00:00<00:00, 1054.37it/s, Materializing param=transformer.h.6.ln_1.weight]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1060.82it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  53%|█████▎    | 79/148 [00:00<00:00, 1056.07it/s, Materializing param=transformer.h.6.ln_2.bias]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1048.83it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  54%|█████▍    | 80/148 [00:00<00:00, 1046.16it/s, Materializing param=transformer.h.6.ln_2.weight]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1052.83it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▍    | 81/148 [00:00<00:00, 1050.14it/s, Materializing param=transformer.h.6.mlp.c_fc.bias]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1058.99it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  55%|█████▌    | 82/148 [00:00<00:00, 1056.41it/s, Materializing param=transformer.h.6.mlp.c_fc.weight]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1065.19it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  56%|█████▌    | 83/148 [00:00<00:00, 1062.57it/s, Materializing param=transformer.h.6.mlp.c_proj.bias]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1068.78it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 84/148 [00:00<00:00, 1066.48it/s, Materializing param=transformer.h.6.mlp.c_proj.weight]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1075.85it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  57%|█████▋    | 85/148 [00:00<00:00, 1073.35it/s, Materializing param=transformer.h.7.attn.c_attn.bias]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1082.65it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  58%|█████▊    | 86/148 [00:00<00:00, 1080.21it/s, Materializing param=transformer.h.7.attn.c_attn.weight]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1089.49it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 87/148 [00:00<00:00, 1086.97it/s, Materializing param=transformer.h.7.attn.c_proj.bias]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1096.03it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  59%|█████▉    | 88/148 [00:00<00:00, 1093.62it/s, Materializing param=transformer.h.7.attn.c_proj.weight]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1102.61it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  60%|██████    | 89/148 [00:00<00:00, 1100.42it/s, Materializing param=transformer.h.7.ln_1.bias]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1109.55it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████    | 90/148 [00:00<00:00, 1107.21it/s, Materializing param=transformer.h.7.ln_1.weight]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1116.08it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  61%|██████▏   | 91/148 [00:00<00:00, 1113.34it/s, Materializing param=transformer.h.7.ln_2.bias]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1122.21it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  62%|██████▏   | 92/148 [00:00<00:00, 1119.88it/s, Materializing param=transformer.h.7.ln_2.weight]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1128.38it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  63%|██████▎   | 93/148 [00:00<00:00, 1125.80it/s, Materializing param=transformer.h.7.mlp.c_fc.bias]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1134.58it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▎   | 94/148 [00:00<00:00, 1132.06it/s, Materializing param=transformer.h.7.mlp.c_fc.weight]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1140.56it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  64%|██████▍   | 95/148 [00:00<00:00, 1137.92it/s, Materializing param=transformer.h.7.mlp.c_proj.bias]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1146.36it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  65%|██████▍   | 96/148 [00:00<00:00, 1143.71it/s, Materializing param=transformer.h.7.mlp.c_proj.weight]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1152.05it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 97/148 [00:00<00:00, 1149.39it/s, Materializing param=transformer.h.8.attn.c_attn.bias]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1157.70it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  66%|██████▌   | 98/148 [00:00<00:00, 1155.06it/s, Materializing param=transformer.h.8.attn.c_attn.weight]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1163.23it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  67%|██████▋   | 99/148 [00:00<00:00, 1160.83it/s, Materializing param=transformer.h.8.attn.c_proj.bias]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1169.01it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 100/148 [00:00<00:00, 1166.57it/s, Materializing param=transformer.h.8.attn.c_proj.weight]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1174.70it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  68%|██████▊   | 101/148 [00:00<00:00, 1172.30it/s, Materializing param=transformer.h.8.ln_1.bias]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1180.19it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  69%|██████▉   | 102/148 [00:00<00:00, 1177.71it/s, Materializing param=transformer.h.8.ln_1.weight]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1185.88it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|██████▉   | 103/148 [00:00<00:00, 1183.47it/s, Materializing param=transformer.h.8.ln_2.bias]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1191.61it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  70%|███████   | 104/148 [00:00<00:00, 1189.44it/s, Materializing param=transformer.h.8.ln_2.weight]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1197.86it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  71%|███████   | 105/148 [00:00<00:00, 1195.69it/s, Materializing param=transformer.h.8.mlp.c_fc.bias]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1203.89it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 106/148 [00:00<00:00, 1201.70it/s, Materializing param=transformer.h.8.mlp.c_fc.weight]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1209.83it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  72%|███████▏  | 107/148 [00:00<00:00, 1207.66it/s, Materializing param=transformer.h.8.mlp.c_proj.bias]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1215.87it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  73%|███████▎  | 108/148 [00:00<00:00, 1213.69it/s, Materializing param=transformer.h.8.mlp.c_proj.weight]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1221.73it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▎  | 109/148 [00:00<00:00, 1219.56it/s, Materializing param=transformer.h.9.attn.c_attn.bias]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1227.65it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  74%|███████▍  | 110/148 [00:00<00:00, 1225.45it/s, Materializing param=transformer.h.9.attn.c_attn.weight]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1233.41it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  75%|███████▌  | 111/148 [00:00<00:00, 1231.23it/s, Materializing param=transformer.h.9.attn.c_proj.bias]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1239.12it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▌  | 112/148 [00:00<00:00, 1236.92it/s, Materializing param=transformer.h.9.attn.c_proj.weight]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1244.78it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  76%|███████▋  | 113/148 [00:00<00:00, 1242.60it/s, Materializing param=transformer.h.9.ln_1.bias]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1250.46it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  77%|███████▋  | 114/148 [00:00<00:00, 1248.29it/s, Materializing param=transformer.h.9.ln_1.weight]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1256.12it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 115/148 [00:00<00:00, 1253.60it/s, Materializing param=transformer.h.9.ln_2.bias]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1261.36it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  78%|███████▊  | 116/148 [00:00<00:00, 1259.20it/s, Materializing param=transformer.h.9.ln_2.weight]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1266.95it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  79%|███████▉  | 117/148 [00:00<00:00, 1264.66it/s, Materializing param=transformer.h.9.mlp.c_fc.bias]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1272.15it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|███████▉  | 118/148 [00:00<00:00, 1269.98it/s, Materializing param=transformer.h.9.mlp.c_fc.weight]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1277.56it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  80%|████████  | 119/148 [00:00<00:00, 1275.38it/s, Materializing param=transformer.h.9.mlp.c_proj.bias]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1282.88it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  81%|████████  | 120/148 [00:00<00:00, 1280.67it/s, Materializing param=transformer.h.9.mlp.c_proj.weight]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1288.00it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 121/148 [00:00<00:00, 1285.77it/s, Materializing param=transformer.h.10.attn.c_attn.bias]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1293.00it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  82%|████████▏ | 122/148 [00:00<00:00, 1290.80it/s, Materializing param=transformer.h.10.attn.c_attn.weight]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1298.32it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  83%|████████▎ | 123/148 [00:00<00:00, 1296.13it/s, Materializing param=transformer.h.10.attn.c_proj.bias]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1303.35it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 124/148 [00:00<00:00, 1301.11it/s, Materializing param=transformer.h.10.attn.c_proj.weight]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1308.26it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  84%|████████▍ | 125/148 [00:00<00:00, 1306.06it/s, Materializing param=transformer.h.10.ln_1.bias]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1313.38it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  85%|████████▌ | 126/148 [00:00<00:00, 1311.20it/s, Materializing param=transformer.h.10.ln_1.weight]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1318.45it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▌ | 127/148 [00:00<00:00, 1316.29it/s, Materializing param=transformer.h.10.ln_2.bias]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1323.57it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  86%|████████▋ | 128/148 [00:00<00:00, 1321.42it/s, Materializing param=transformer.h.10.ln_2.weight]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1328.73it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  87%|████████▋ | 129/148 [00:00<00:00, 1326.46it/s, Materializing param=transformer.h.10.mlp.c_fc.bias]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1333.57it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  88%|████████▊ | 130/148 [00:00<00:00, 1331.36it/s, Materializing param=transformer.h.10.mlp.c_fc.weight]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1338.47it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▊ | 131/148 [00:00<00:00, 1336.30it/s, Materializing param=transformer.h.10.mlp.c_proj.bias]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1343.33it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  89%|████████▉ | 132/148 [00:00<00:00, 1340.92it/s, Materializing param=transformer.h.10.mlp.c_proj.weight]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1348.01it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  90%|████████▉ | 133/148 [00:00<00:00, 1345.74it/s, Materializing param=transformer.h.11.attn.c_attn.bias]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1352.70it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 134/148 [00:00<00:00, 1350.54it/s, Materializing param=transformer.h.11.attn.c_attn.weight]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1357.46it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  91%|█████████ | 135/148 [00:00<00:00, 1355.28it/s, Materializing param=transformer.h.11.attn.c_proj.bias]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1362.16it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  92%|█████████▏| 136/148 [00:00<00:00, 1359.96it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.attn.c_proj.weight]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 137/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.ln_1.bias]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  93%|█████████▎| 138/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.ln_1.weight]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  94%|█████████▍| 139/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.ln_2.bias]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▍| 140/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.ln_2.weight]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  95%|█████████▌| 141/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.mlp.c_fc.bias]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  96%|█████████▌| 142/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.mlp.c_fc.weight]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 143/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.mlp.c_proj.bias]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  97%|█████████▋| 144/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.h.11.mlp.c_proj.weight]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  98%|█████████▊| 145/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.ln_f.bias]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▊| 146/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.ln_f.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.wpe.weight]
Loading weights:  99%|█████████▉| 147/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.wpe.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1366.94it/s, Materializing param=transformer.wte.weight]
Loading weights: 100%|██████████| 148/148 [00:00<00:00, 1408.31it/s, Materializing param=transformer.wte.weight]
GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  |
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
WARNING:transformers.modeling_utils:GPT2LMHeadModel LOAD REPORT from: gpt2
Key                  | Status     |  |
---------------------+------------+--+-
h.{0...11}.attn.bias | UNEXPECTED |  |

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Running 5 training steps...
  Step 1/5, Loss: 12.2640
  Step 2/5, Loss: 12.1026
  Step 3/5, Loss: 11.9368
  Step 4/5, Loss: 11.8016
  Step 5/5, Loss: 11.7577
✓ Memory snapshot saved to pipeline_snapshot.pickle
Analyzing memory with Mosaic...

============================================================
PIPELINE REPORT
============================================================
Model: gpt2
Config: {'batch_size': 4, 'seq_length': 512, 'num_steps': 5}
PyTorch Peak Memory: 5.126 GB
Mosaic Dynamic Peak: 4.620 GiB
Mosaic Overall Peak: 5.115 GiB

CI/CD and Dashboard Integration Patterns#

These patterns show how to integrate Mosaic analysis into automated workflows.

import json

Pattern 1: CI/CD Memory Regression Testing#

def check_memory_regression(report, threshold_gib=5.0):
    """Check if memory usage exceeds threshold for CI/CD pipelines.

    Args:
        report: Memory analysis report from training_pipeline_with_memory_monitoring.
        threshold_gib: Maximum allowed memory in GiB.

    Raises:
        AssertionError: If memory exceeds threshold.
    """
    peak = report["mosaic_analysis"]["overall_peak_gib"]
    assert peak < threshold_gib, (
        f"Memory regression! {peak:.2f} GiB > {threshold_gib} GiB"
    )
    print(f"Memory check passed: {peak:.2f} GiB < {threshold_gib} GiB threshold")

Pattern 2: Export to JSON for Dashboards#

if HAS_CUDA:
    check_memory_regression(report, threshold_gib=8.0)

    with open("memory_report.json", "w") as f:
        json.dump(report, f, indent=2, default=str)

    print("Memory report exported to memory_report.json")

Memory check passed: 5.12 GiB < 8.0 GiB threshold
Memory report exported to memory_report.json

Conclusion#

This tutorial demonstrated three key use cases for Mosaic memory profiling:

Case 1: Activation Checkpointing Analysis

Used Mosaic to compare memory usage between baseline and optimized models
Identified that activation checkpointing reduced activation memory by 71%
Mosaic’s categorical profiling made it trivial to pinpoint memory savings

Case 2: Debugging Unexpected Memory Usage

Created a “buggy” model with abandoned debug code
Used mosaic_get_memory_usage_peak to identify extra allocations
Stack traces revealed optimizer state tracking extra parameters

Case 3: Pipeline Integration

Demonstrated programmatic usage via Mosaic’s Python API
Showed integration patterns for CI/CD and dashboards with structured reports

Mosaic: Memory Profiling for PyTorch#

Introduction to Mosaic#

Overview#

Getting Started#

Simple Usage Examples#

Dependencies and Imports#

Shared Utilities#

Case 1: Understanding Memory Differences with Activation Checkpointing#

Training Function for Activation Checkpointing Comparison#

Run Baseline Training (Without Activation Checkpointing)#

Run Modified Training (With Activation Checkpointing)#

Generate Categorical Memory Profiles with Mosaic#

Download Generated Files (Google Colab)#

Results Interpretation: Activation Checkpointing#

What We Observed#

Key Insights#

Why Does This Happen?#

How Mosaic Helped#

Case 2: Debugging Unexpected Memory Usage#

The Buggy Model#

Training Functions for Debug Comparison#

Run Training for Baseline (Clean Model)#

Run Training WITH the Bug#

Use Mosaic to Find the Problem#

Analyze the Baseline (Clean) Snapshot#

Analyze the Buggy Snapshot#

Analyzing The Mosaic Output#

Memory Total Explanation#

Case 3: Integrating Memory Analysis into Your Training Pipeline#

Training with Automatic Memory Capture#

Mosaic Memory Analysis via Python API#

Reusable Memory Analysis Function#

Complete Training Pipeline with Memory Monitoring#

CI/CD and Dashboard Integration Patterns#

Pattern 1: CI/CD Memory Regression Testing#

Pattern 2: Export to JSON for Dashboards#

Conclusion#

Further Reading#

Docs

Tutorials

Resources