Note
Go to the end to download the full example code.
Hyperparameter tuning using Ray Tune#
Created On: Aug 31, 2020 | Last Updated: Jan 08, 2026 | Last Verified: Nov 05, 2024
Author: Ricardo Decal
This tutorial shows how to integrate Ray Tune into your PyTorch training workflow to perform scalable and efficient hyperparameter tuning.
How to modify a PyTorch training loop for Ray Tune
How to scale a hyperparameter sweep to multiple nodes and GPUs without code changes
How to define a hyperparameter search space and run a sweep with
tune.TunerHow to use an early-stopping scheduler (ASHA) and report metrics/checkpoints
How to use checkpointing to resume training and load the best model
PyTorch v2.9+ and
torchvisionRay Tune (
ray[tune]) v2.52.1+GPU(s) are optional, but recommended for faster training
Ray, a project of the PyTorch Foundation, is an open source unified framework for scaling AI and Python applications. It helps run distributed jobs by handling the complexity of distributed computing. Ray Tune is a library built on Ray for hyperparameter tuning that enables you to scale a hyperparameter sweep from your machine to a large cluster with no code changes.
This tutorial adapts the PyTorch tutorial for training a CIFAR10 classifier to run multi-GPU hyperparameter sweeps with Ray Tune.
Setup#
To run this tutorial, install the following dependencies:
pip install "ray[tune]" torchvision
Then start with the imports:
from functools import partial
import os
import tempfile
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import random_split
import torchvision
import torchvision.transforms as transforms
# New: imports for Ray Tune
import ray
from ray import tune
from ray.tune import Checkpoint
from ray.tune.schedulers import ASHAScheduler
Data loading#
Wrap the data loaders in a constructor function. In this tutorial, a global data directory is passed to the function to enable reusing the dataset across different trials. In a cluster environment, you can use shared storage, such as network file systems, to prevent each node from downloading the data separately.
def load_data(data_dir="./data"):
# Mean and standard deviation of the CIFAR10 training subset.
transform = transforms.Compose(
[transforms.ToTensor(), transforms.Normalize((0.4914, 0.48216, 0.44653), (0.2022, 0.19932, 0.20086))]
)
trainset = torchvision.datasets.CIFAR10(
root=data_dir, train=True, download=True, transform=transform
)
testset = torchvision.datasets.CIFAR10(
root=data_dir, train=False, download=True, transform=transform
)
return trainset, testset
Model architecture#
This tutorial searches for the best sizes for the fully connected layers
and the learning rate. To enable this, the Net class exposes the
layer sizes l1 and l2 as configurable parameters that Ray Tune
can search over:
class Net(nn.Module):
def __init__(self, l1=120, l2=84):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, l1)
self.fc2 = nn.Linear(l1, l2)
self.fc3 = nn.Linear(l2, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Define the search space#
Next, define the hyperparameters to tune and how Ray Tune samples them.
Ray Tune offers a variety of search space
distributions
to suit different parameter types: loguniform, uniform,
choice, randint, grid, and more. You can also express
complex dependencies between parameters with conditional search
spaces
or sample from arbitrary functions.
Here is the search space for this tutorial:
config = {
"l1": tune.choice([2**i for i in range(9)]),
"l2": tune.choice([2**i for i in range(9)]),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16]),
}
The tune.choice() accepts a list of values that are uniformly
sampled from. In this example, the l1 and l2 parameter values
are powers of 2 between 1 and 256, and the learning rate samples on a
log scale between 0.0001 and 0.1. Sampling on a log scale enables
exploration across a range of magnitudes on a relative scale, rather
than an absolute scale.
Training function#
Ray Tune requires a training function that accepts a configuration dictionary and runs the main training loop. As Ray Tune runs different trials, it updates the configuration dictionary for each trial.
Here is the full training function, followed by explanations of the key Ray Tune integration points:
def train_cifar(config, data_dir=None):
net = Net(config["l1"], config["l2"])
device = config["device"]
net = net.to(device)
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)
# Load checkpoint if resuming training
checkpoint = tune.get_checkpoint()
if checkpoint:
with checkpoint.as_directory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
checkpoint_state = torch.load(checkpoint_path)
start_epoch = checkpoint_state["epoch"]
net.load_state_dict(checkpoint_state["net_state_dict"])
optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
else:
start_epoch = 0
trainset, _testset = load_data(data_dir)
test_abs = int(len(trainset) * 0.8)
train_subset, val_subset = random_split(
trainset, [test_abs, len(trainset) - test_abs]
)
trainloader = torch.utils.data.DataLoader(
train_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
)
valloader = torch.utils.data.DataLoader(
val_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
)
for epoch in range(start_epoch, 10): # loop over the dataset multiple times
running_loss = 0.0
epoch_steps = 0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
epoch_steps += 1
if i % 2000 == 1999: # print every 2000 mini-batches
print(
"[%d, %5d] loss: %.3f"
% (epoch + 1, i + 1, running_loss / epoch_steps)
)
running_loss = 0.0
# Validation loss
val_loss = 0.0
val_steps = 0
total = 0
correct = 0
for i, data in enumerate(valloader, 0):
with torch.no_grad():
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
loss = criterion(outputs, labels)
val_loss += loss.cpu().numpy()
val_steps += 1
# Save checkpoint and report metrics
checkpoint_data = {
"epoch": epoch,
"net_state_dict": net.state_dict(),
"optimizer_state_dict": optimizer.state_dict(),
}
with tempfile.TemporaryDirectory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
torch.save(checkpoint_data, checkpoint_path)
checkpoint = Checkpoint.from_directory(checkpoint_dir)
tune.report(
{"loss": val_loss / val_steps, "accuracy": correct / total},
checkpoint=checkpoint,
)
print("Finished Training")
Key integration points#
Using hyperparameters from the configuration dictionary#
Ray Tune updates the config dictionary with the hyperparameters for
each trial. In this example, the model architecture and optimizer
receive the hyperparameters from the config dictionary:
Reporting metrics and saving checkpoints#
The most important integration is communicating with Ray Tune. Ray Tune uses the validation metrics to determine the best hyperparameter configuration and to stop underperforming trials early, saving resources.
Checkpointing enables you to later load the trained models, resume hyperparameter searches, and provides fault tolerance. It’s also required for some Ray Tune schedulers like Population Based Training that pause and resume trials during the search.
This code from the training function loads model and optimizer state at the start if a checkpoint exists:
checkpoint = tune.get_checkpoint()
if checkpoint:
with checkpoint.as_directory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
checkpoint_state = torch.load(checkpoint_path)
start_epoch = checkpoint_state["epoch"]
net.load_state_dict(checkpoint_state["net_state_dict"])
optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
At the end of each epoch, save a checkpoint and report the validation metrics:
checkpoint_data = {
"epoch": epoch,
"net_state_dict": net.state_dict(),
"optimizer_state_dict": optimizer.state_dict(),
}
with tempfile.TemporaryDirectory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
torch.save(checkpoint_data, checkpoint_path)
checkpoint = Checkpoint.from_directory(checkpoint_dir)
tune.report(
{"loss": val_loss / val_steps, "accuracy": correct / total},
checkpoint=checkpoint,
)
Ray Tune checkpointing supports local file systems, cloud storage, and distributed file systems. For more information, see the Ray Tune storage documentation.
Multi-GPU support#
Image classification models can be greatly accelerated by using GPUs.
The training function supports multi-GPU training by wrapping the model
in nn.DataParallel:
if torch.cuda.device_count() > 1:
net = nn.DataParallel(net)
This training function supports training on CPUs, a single GPU, multiple GPUs, or multiple nodes without code changes. Ray Tune automatically distributes the trials across the nodes according to the available resources. Ray Tune also supports fractional GPUs so that one GPU can be shared among multiple trials, provided that the models, optimizers, and data batches fit into the GPU memory.
Validation split#
The original CIFAR10 dataset only has train and test subsets. This is sufficient for training a single model, however for hyperparameter tuning a validation subset is required. The training function creates a validation subset by reserving 20% of the training subset. The test subset is used to evaluate the best model’s generalization error after the search completes.
Evaluation function#
After finding the optimal hyperparameters, test the model on a held-out test set to estimate the generalization error:
def test_accuracy(net, device="cpu", data_dir=None):
_trainset, testset = load_data(data_dir)
testloader = torch.utils.data.DataLoader(
testset, batch_size=4, shuffle=False, num_workers=2
)
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
image_batch, labels = data
image_batch, labels = image_batch.to(device), labels.to(device)
outputs = net(image_batch)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
return correct / total
Configure and run Ray Tune#
With the training and evaluation functions defined, configure Ray Tune to run the hyperparameter search.
Scheduler for early stopping#
Ray Tune provides schedulers to improve the efficiency of the
hyperparameter search by detecting underperforming trials and stopping
them early. The ASHAScheduler uses the Asynchronous Successive
Halving Algorithm (ASHA) to aggressively terminate low-performing
trials:
scheduler = ASHAScheduler(
max_t=max_num_epochs,
grace_period=1,
reduction_factor=2,
)
Ray Tune also provides advanced search algorithms to smartly pick the next set of hyperparameters based on previous results, instead of relying only on random or grid search. Examples include Optuna and BayesOpt.
Resource allocation#
Tell Ray Tune what resources to allocate for each trial by passing a
resources dictionary to tune.with_resources:
tune.with_resources(
partial(train_cifar, data_dir=data_dir),
resources={"cpu": cpus_per_trial, "gpu": gpus_per_trial}
)
Ray Tune automatically manages the placement of these trials and ensures that the trials run in isolation, so you don’t need to manually assign GPUs to processes.
For example, if you are running this experiment on a cluster of 20
machines, each with 8 GPUs, you can set gpus_per_trial = 0.5 to
schedule two concurrent trials per GPU. This configuration runs 320
trials in parallel across the cluster.
Note
To run this tutorial without GPUs, set gpus_per_trial=0
and expect significantly longer runtimes.
To avoid long runtimes during development, start with a small number of trials and epochs.
Creating the Tuner#
The Ray Tune API is modular and composable. Pass your configuration to
the tune.Tuner class to create a tuner object, then run
tuner.fit() to start training:
tuner = tune.Tuner(
tune.with_resources(
partial(train_cifar, data_dir=data_dir),
resources={"cpu": cpus_per_trial, "gpu": gpus_per_trial}
),
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
scheduler=scheduler,
num_samples=num_trials,
),
param_space=config,
)
results = tuner.fit()
After training completes, retrieve the best performing trial, load its checkpoint, and evaluate on the test set.
Putting it all together#
def main(num_trials=10, max_num_epochs=10, gpus_per_trial=0, cpus_per_trial=2):
print("Starting hyperparameter tuning.")
ray.init(include_dashboard=False)
data_dir = os.path.abspath("./data")
load_data(data_dir) # Pre-download the dataset
device = "cuda" if torch.cuda.is_available() else "cpu"
config = {
"l1": tune.choice([2**i for i in range(9)]),
"l2": tune.choice([2**i for i in range(9)]),
"lr": tune.loguniform(1e-4, 1e-1),
"batch_size": tune.choice([2, 4, 8, 16]),
"device": device,
}
scheduler = ASHAScheduler(
max_t=max_num_epochs,
grace_period=1,
reduction_factor=2,
)
tuner = tune.Tuner(
tune.with_resources(
partial(train_cifar, data_dir=data_dir),
resources={"cpu": cpus_per_trial, "gpu": gpus_per_trial}
),
tune_config=tune.TuneConfig(
metric="loss",
mode="min",
scheduler=scheduler,
num_samples=num_trials,
),
param_space=config,
)
results = tuner.fit()
best_result = results.get_best_result("loss", "min")
print(f"Best trial config: {best_result.config}")
print(f"Best trial final validation loss: {best_result.metrics['loss']}")
print(f"Best trial final validation accuracy: {best_result.metrics['accuracy']}")
best_trained_model = Net(best_result.config["l1"], best_result.config["l2"])
best_trained_model = best_trained_model.to(device)
if gpus_per_trial > 1:
best_trained_model = nn.DataParallel(best_trained_model)
best_checkpoint = best_result.checkpoint
with best_checkpoint.as_directory() as checkpoint_dir:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.pt"
best_checkpoint_data = torch.load(checkpoint_path)
best_trained_model.load_state_dict(best_checkpoint_data["net_state_dict"])
test_acc = test_accuracy(best_trained_model, device, data_dir)
print(f"Best trial test set accuracy: {test_acc}")
if __name__ == "__main__":
# Set the number of trials, epochs, and GPUs per trial here:
main(num_trials=10, max_num_epochs=10, gpus_per_trial=1)
Starting hyperparameter tuning.
2026-02-11 23:29:25,015 WARNING services.py:2137 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 2147471360 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2026-02-11 23:29:25,185 INFO worker.py:2023 -- Started a local Ray instance.
/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py:2062: FutureWarning:
Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
0%| | 0.00/170M [00:00<?, ?B/s]
0%| | 655k/170M [00:00<00:25, 6.55MB/s]
5%|▍ | 8.39M/170M [00:00<00:03, 48.1MB/s]
12%|█▏ | 19.9M/170M [00:00<00:01, 78.6MB/s]
19%|█▊ | 31.6M/170M [00:00<00:01, 93.7MB/s]
25%|██▌ | 43.3M/170M [00:00<00:01, 102MB/s]
32%|███▏ | 55.0M/170M [00:00<00:01, 107MB/s]
39%|███▉ | 66.6M/170M [00:00<00:00, 110MB/s]
46%|████▌ | 78.3M/170M [00:00<00:00, 112MB/s]
53%|█████▎ | 89.9M/170M [00:00<00:00, 114MB/s]
60%|█████▉ | 102M/170M [00:01<00:00, 115MB/s]
66%|██████▋ | 113M/170M [00:01<00:00, 115MB/s]
73%|███████▎ | 125M/170M [00:01<00:00, 116MB/s]
80%|████████ | 137M/170M [00:01<00:00, 116MB/s]
87%|████████▋ | 149M/170M [00:01<00:00, 116MB/s]
94%|█████████▍| 160M/170M [00:01<00:00, 116MB/s]
100%|██████████| 170M/170M [00:01<00:00, 107MB/s]
╭────────────────────────────────────────────────────────────────────╮
│ Configuration for experiment train_cifar_2026-02-11_23-29-30 │
├────────────────────────────────────────────────────────────────────┤
│ Search algorithm BasicVariantGenerator │
│ Scheduler AsyncHyperBandScheduler │
│ Number of trials 10 │
╰────────────────────────────────────────────────────────────────────╯
View detailed results here: /var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30
To visualize your results with TensorBoard, run: `tensorboard --logdir /tmp/ray/session_2026-02-11_23-29-23_545871_3915/artifacts/2026-02-11_23-29-30/train_cifar_2026-02-11_23-29-30/driver_artifacts`
Trial status: 10 PENDING
Current time: 2026-02-11 23:29:31. Total running time: 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭───────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size │
├───────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 PENDING 1 32 0.0358363 2 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰───────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_83ced_00000 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00000 config │
├──────────────────────────────────────────────────┤
│ batch_size 2 │
│ device cuda │
│ l1 1 │
│ l2 32 │
│ lr 0.03584 │
╰──────────────────────────────────────────────────╯
(func pid=5035) [1, 2000] loss: 2.343
(func pid=5035) [1, 4000] loss: 1.170
(func pid=5035) [1, 6000] loss: 0.780
(pid=gcs_server) [2026-02-11 23:29:54,079 E 3920 3920] (gcs_server) gcs_server.cc:303: Failed to establish connection to the event+metrics exporter agent. Events and metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(raylet) [2026-02-11 23:29:55,125 E 4060 4060] (raylet) main.cc:979: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(bundle_reservation_check_func pid=4134) [2026-02-11 23:29:55,801 E 4134 4337] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(func pid=5035) [1, 8000] loss: 0.585
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:30:01. Total running time: 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭───────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size │
├───────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰───────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [2026-02-11 23:30:01,662 E 5035 5070] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14 [repeated 14x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(func pid=5035) [1, 10000] loss: 0.469
(func pid=5035) [1, 12000] loss: 0.390
(func pid=5035) [1, 14000] loss: 0.334
(func pid=5035) [1, 16000] loss: 0.293
(func pid=5035) [1, 18000] loss: 0.260
(func pid=5035) [1, 20000] loss: 0.234
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:30:31. Total running time: 1min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
╭───────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size │
├───────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰───────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000000)
(func pid=5035) [2, 2000] loss: 2.344
(func pid=5035) [2, 4000] loss: 1.172
(func pid=5035) [2, 6000] loss: 0.781
(func pid=5035) [2, 8000] loss: 0.585
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:31:01. Total running time: 1min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3491710545778273 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 1 63.6209 2.34917 0.1054 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [2, 10000] loss: 0.469
(func pid=5035) [2, 12000] loss: 0.390
(func pid=5035) [2, 14000] loss: 0.335
(func pid=5035) [2, 16000] loss: 0.293
(func pid=5035) [2, 18000] loss: 0.260
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:31:31. Total running time: 2min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3491710545778273 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 1 63.6209 2.34917 0.1054 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [2, 20000] loss: 0.234
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000001)
(func pid=5035) [3, 2000] loss: 2.341
(func pid=5035) [3, 4000] loss: 1.172
(func pid=5035) [3, 6000] loss: 0.780
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:32:01. Total running time: 2min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.331923544096947 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 2 125.833 2.33192 0.0972 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [3, 8000] loss: 0.584
(func pid=5035) [3, 10000] loss: 0.469
(func pid=5035) [3, 12000] loss: 0.390
(func pid=5035) [3, 14000] loss: 0.335
(func pid=5035) [3, 16000] loss: 0.293
(func pid=5035) [3, 18000] loss: 0.260
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:32:31. Total running time: 3min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.331923544096947 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 2 125.833 2.33192 0.0972 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [3, 20000] loss: 0.234
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000002)
(func pid=5035) [4, 2000] loss: 2.339
(func pid=5035) [4, 4000] loss: 1.171
(func pid=5035) [4, 6000] loss: 0.780
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:33:01. Total running time: 3min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3315538910388947 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 3 187.733 2.33155 0.0939 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [4, 8000] loss: 0.586
(func pid=5035) [4, 10000] loss: 0.468
(func pid=5035) [4, 12000] loss: 0.390
(func pid=5035) [4, 14000] loss: 0.335
(func pid=5035) [4, 16000] loss: 0.293
(func pid=5035) [4, 18000] loss: 0.260
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:33:31. Total running time: 4min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3315538910388947 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 3 187.733 2.33155 0.0939 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [4, 20000] loss: 0.235
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000003)
(func pid=5035) [5, 2000] loss: 2.343
(func pid=5035) [5, 4000] loss: 1.171
(func pid=5035) [5, 6000] loss: 0.782
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:34:01. Total running time: 4min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3645524888515475 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 4 249.582 2.36455 0.0981 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [5, 8000] loss: 0.587
(func pid=5035) [5, 10000] loss: 0.469
(func pid=5035) [5, 12000] loss: 0.391
(func pid=5035) [5, 14000] loss: 0.335
(func pid=5035) [5, 16000] loss: 0.293
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:34:31. Total running time: 5min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3645524888515475 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 4 249.582 2.36455 0.0981 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [5, 18000] loss: 0.261
(func pid=5035) [5, 20000] loss: 0.235
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000004)
(func pid=5035) [6, 2000] loss: 2.347
(func pid=5035) [6, 4000] loss: 1.171
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:35:01. Total running time: 5min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3439406710863113 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 5 311.658 2.34394 0.1013 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [6, 6000] loss: 0.782
(func pid=5035) [6, 8000] loss: 0.585
(func pid=5035) [6, 10000] loss: 0.469
(func pid=5035) [6, 12000] loss: 0.390
(func pid=5035) [6, 14000] loss: 0.334
(func pid=5035) [6, 16000] loss: 0.293
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:35:31. Total running time: 6min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3439406710863113 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 5 311.658 2.34394 0.1013 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [6, 18000] loss: 0.260
(func pid=5035) [6, 20000] loss: 0.234
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000005)
(func pid=5035) [7, 2000] loss: 2.343
(func pid=5035) [7, 4000] loss: 1.171
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:36:01. Total running time: 6min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3349232776641844 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 6 374.063 2.33492 0.0981 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [7, 6000] loss: 0.782
(func pid=5035) [7, 8000] loss: 0.585
(func pid=5035) [7, 10000] loss: 0.469
(func pid=5035) [7, 12000] loss: 0.390
(func pid=5035) [7, 14000] loss: 0.334
(func pid=5035) [7, 16000] loss: 0.293
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:36:31. Total running time: 7min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3349232776641844 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 6 374.063 2.33492 0.0981 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [7, 18000] loss: 0.261
(func pid=5035) [7, 20000] loss: 0.234
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000006)
(func pid=5035) [8, 2000] loss: 2.344
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:37:01. Total running time: 7min 30s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.340644465136528 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 7 436.337 2.34064 0.0939 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [8, 4000] loss: 1.171
(func pid=5035) [8, 6000] loss: 0.784
(func pid=5035) [8, 8000] loss: 0.586
(func pid=5035) [8, 10000] loss: 0.468
(func pid=5035) [8, 12000] loss: 0.390
(func pid=5035) [8, 14000] loss: 0.335
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:37:31. Total running time: 8min 0s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.340644465136528 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 7 436.337 2.34064 0.0939 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [8, 16000] loss: 0.293
(func pid=5035) [8, 18000] loss: 0.261
(func pid=5035) [8, 20000] loss: 0.235
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000007)
(func pid=5035) [9, 2000] loss: 2.346
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:38:01. Total running time: 8min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.337790411448479 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 8 498.573 2.33779 0.1054 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [9, 4000] loss: 1.170
(func pid=5035) [9, 6000] loss: 0.780
(func pid=5035) [9, 8000] loss: 0.585
(func pid=5035) [9, 10000] loss: 0.469
(func pid=5035) [9, 12000] loss: 0.390
(func pid=5035) [9, 14000] loss: 0.334
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:38:31. Total running time: 9min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.337790411448479 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 8 498.573 2.33779 0.1054 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [9, 16000] loss: 0.293
(func pid=5035) [9, 18000] loss: 0.260
(func pid=5035) [9, 20000] loss: 0.234
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000008)
(func pid=5035) [10, 2000] loss: 2.343
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:39:02. Total running time: 9min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.369712836313248 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 9 560.918 2.36971 0.1054 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [10, 4000] loss: 1.172
(func pid=5035) [10, 6000] loss: 0.781
(func pid=5035) [10, 8000] loss: 0.586
(func pid=5035) [10, 10000] loss: 0.469
(func pid=5035) [10, 12000] loss: 0.391
Trial status: 1 RUNNING | 9 PENDING
Current time: 2026-02-11 23:39:32. Total running time: 10min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.369712836313248 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 RUNNING 1 32 0.0358363 2 9 560.918 2.36971 0.1054 │
│ train_cifar_83ced_00001 PENDING 16 8 0.00570343 4 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5035) [10, 14000] loss: 0.335
(func pid=5035) [10, 16000] loss: 0.293
(func pid=5035) [10, 18000] loss: 0.260
(func pid=5035) [10, 20000] loss: 0.234
Trial train_cifar_83ced_00000 completed after 10 iterations at 2026-02-11 23:39:57. Total running time: 10min 26s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00000 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 61.90695 │
│ time_total_s 622.82469 │
│ training_iteration 10 │
│ accuracy 0.1054 │
│ loss 2.31272 │
╰────────────────────────────────────────────────────────────╯
(func pid=5035) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00000_0_batch_size=2,l1=1,l2=32,lr=0.0358_2026-02-11_23-29-30/checkpoint_000009)
Trial train_cifar_83ced_00001 started with configuration:
╭─────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00001 config │
├─────────────────────────────────────────────────┤
│ batch_size 4 │
│ device cuda │
│ l1 16 │
│ l2 8 │
│ lr 0.0057 │
╰─────────────────────────────────────────────────╯
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:40:02. Total running time: 10min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3127221729278564 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [1, 2000] loss: 2.159
(func pid=5810) [1, 4000] loss: 1.140
(func pid=5810) [1, 6000] loss: 0.730
(func pid=5810) [1, 8000] loss: 0.549
(func pid=5810) [2026-02-11 23:40:28,552 E 5810 5845] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(func pid=5810) [1, 10000] loss: 0.460
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:40:32. Total running time: 11min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00000 with loss=2.3127221729278564 and params={'l1': 1, 'l2': 32, 'lr': 0.035836253144714454, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000000)
(func pid=5810) [2, 2000] loss: 2.306
(func pid=5810) [2, 4000] loss: 1.153
(func pid=5810) [2, 6000] loss: 0.769
(func pid=5810) [2, 8000] loss: 0.576
(func pid=5810) [2, 10000] loss: 0.461
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:41:02. Total running time: 11min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.312152072238922 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 1 33.4051 2.31215 0.0984 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000001)
(func pid=5810) [3, 2000] loss: 2.306
(func pid=5810) [3, 4000] loss: 1.153
(func pid=5810) [3, 6000] loss: 0.769
(func pid=5810) [3, 8000] loss: 0.576
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:41:32. Total running time: 12min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.306246109676361 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 2 64.9622 2.30625 0.0979 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [3, 10000] loss: 0.461
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000002)
(func pid=5810) [4, 2000] loss: 2.306
(func pid=5810) [4, 4000] loss: 1.153
(func pid=5810) [4, 6000] loss: 0.769
(func pid=5810) [4, 8000] loss: 0.576
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:42:02. Total running time: 12min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.3038991704940797 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 3 96.6264 2.3039 0.0979 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [4, 10000] loss: 0.461
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000003)
(func pid=5810) [5, 2000] loss: 2.306
(func pid=5810) [5, 4000] loss: 1.153
(func pid=5810) [5, 6000] loss: 0.769
(func pid=5810) [5, 8000] loss: 0.576
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:42:32. Total running time: 13min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.306491449737549 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 4 128.244 2.30649 0.1014 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [5, 10000] loss: 0.461
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000004)
(func pid=5810) [6, 2000] loss: 2.306
(func pid=5810) [6, 4000] loss: 1.153
(func pid=5810) [6, 6000] loss: 0.768
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:43:02. Total running time: 13min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.305462149143219 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 5 159.574 2.30546 0.1021 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [6, 8000] loss: 0.576
(func pid=5810) [6, 10000] loss: 0.461
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000005)
(func pid=5810) [7, 2000] loss: 2.306
(func pid=5810) [7, 4000] loss: 1.153
(func pid=5810) [7, 6000] loss: 0.769
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:43:32. Total running time: 14min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.30500927400589 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 6 191.119 2.30501 0.1014 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [7, 8000] loss: 0.576
(func pid=5810) [7, 10000] loss: 0.461
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000006)
(func pid=5810) [8, 2000] loss: 2.305
(func pid=5810) [8, 4000] loss: 1.153
(func pid=5810) [8, 6000] loss: 0.769
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:44:02. Total running time: 14min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.3038755395889283 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 7 222.677 2.30388 0.0998 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [8, 8000] loss: 0.576
(func pid=5810) [8, 10000] loss: 0.461
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000007)
(func pid=5810) [9, 2000] loss: 2.305
(func pid=5810) [9, 4000] loss: 1.153
(func pid=5810) [9, 6000] loss: 0.769
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:44:32. Total running time: 15min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.3055198941230772 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 8 253.92 2.30552 0.0962 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [9, 8000] loss: 0.577
(func pid=5810) [9, 10000] loss: 0.461
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000008)
(func pid=5810) [10, 2000] loss: 2.306
(func pid=5810) [10, 4000] loss: 1.153
Trial status: 1 TERMINATED | 1 RUNNING | 8 PENDING
Current time: 2026-02-11 23:45:02. Total running time: 15min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.3033587822914123 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00001 RUNNING 16 8 0.00570343 4 9 285.553 2.30336 0.0979 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00002 PENDING 8 64 0.00169369 2 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=5810) [10, 6000] loss: 0.769
(func pid=5810) [10, 8000] loss: 0.576
(func pid=5810) [10, 10000] loss: 0.461
Trial train_cifar_83ced_00001 completed after 10 iterations at 2026-02-11 23:45:18. Total running time: 15min 47s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00001 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 31.47603 │
│ time_total_s 317.02885 │
│ training_iteration 10 │
│ accuracy 0.0984 │
│ loss 2.306 │
╰────────────────────────────────────────────────────────────╯
(func pid=5810) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00001_1_batch_size=4,l1=16,l2=8,lr=0.0057_2026-02-11_23-29-30/checkpoint_000009)
Trial train_cifar_83ced_00002 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00002 config │
├──────────────────────────────────────────────────┤
│ batch_size 2 │
│ device cuda │
│ l1 8 │
│ l2 64 │
│ lr 0.00169 │
╰──────────────────────────────────────────────────╯
(func pid=6547) [1, 2000] loss: 2.146
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:45:32. Total running time: 16min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.3060002333641054 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [1, 4000] loss: 0.985
(func pid=6547) [1, 6000] loss: 0.624
(func pid=6547) [1, 8000] loss: 0.453
(func pid=6547) [2026-02-11 23:45:49,594 E 6547 6582] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(func pid=6547) [1, 10000] loss: 0.367
(func pid=6547) [1, 12000] loss: 0.302
(func pid=6547) [1, 14000] loss: 0.252
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:46:02. Total running time: 16min 31s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00001 with loss=2.3060002333641054 and params={'l1': 16, 'l2': 8, 'lr': 0.005703431073203665, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [1, 16000] loss: 0.223
(func pid=6547) [1, 18000] loss: 0.201
(func pid=6547) [1, 20000] loss: 0.177
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000000)
(func pid=6547) [2, 2000] loss: 1.818
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:46:32. Total running time: 17min 1s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.7541581141471863 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 1 63.9463 1.75416 0.36 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [2, 4000] loss: 0.904
(func pid=6547) [2, 6000] loss: 0.593
(func pid=6547) [2, 8000] loss: 0.443
(func pid=6547) [2, 10000] loss: 0.353
(func pid=6547) [2, 12000] loss: 0.302
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:47:02. Total running time: 17min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.7541581141471863 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 1 63.9463 1.75416 0.36 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [2, 14000] loss: 0.262
(func pid=6547) [2, 16000] loss: 0.222
(func pid=6547) [2, 18000] loss: 0.196
(func pid=6547) [2, 20000] loss: 0.178
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000001)
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:47:32. Total running time: 18min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.7881175030380487 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 2 125.606 1.78812 0.3389 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [3, 2000] loss: 1.792
(func pid=6547) [3, 4000] loss: 0.894
(func pid=6547) [3, 6000] loss: 0.580
(func pid=6547) [3, 8000] loss: 0.435
(func pid=6547) [3, 10000] loss: 0.361
(func pid=6547) [3, 12000] loss: 0.300
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:48:03. Total running time: 18min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.7881175030380487 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 2 125.606 1.78812 0.3389 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [3, 14000] loss: 0.256
(func pid=6547) [3, 16000] loss: 0.219
(func pid=6547) [3, 18000] loss: 0.198
(func pid=6547) [3, 20000] loss: 0.182
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000002)
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:48:33. Total running time: 19min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.8357078468129038 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 3 187.099 1.83571 0.3349 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [4, 2000] loss: 1.844
(func pid=6547) [4, 4000] loss: 0.909
(func pid=6547) [4, 6000] loss: 0.633
(func pid=6547) [4, 8000] loss: 0.473
(func pid=6547) [4, 10000] loss: 0.375
(func pid=6547) [4, 12000] loss: 0.315
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:49:03. Total running time: 19min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.8357078468129038 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 3 187.099 1.83571 0.3349 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [4, 14000] loss: 0.291
(func pid=6547) [4, 16000] loss: 0.247
(func pid=6547) [4, 18000] loss: 0.221
(func pid=6547) [4, 20000] loss: 0.196
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000003)
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:49:33. Total running time: 20min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.9853033395111561 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 4 249.025 1.9853 0.2559 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [5, 2000] loss: 1.973
(func pid=6547) [5, 4000] loss: 1.026
(func pid=6547) [5, 6000] loss: 0.690
(func pid=6547) [5, 8000] loss: 0.570
(func pid=6547) [5, 10000] loss: 0.447
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:50:03. Total running time: 20min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.9853033395111561 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 4 249.025 1.9853 0.2559 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [5, 12000] loss: 0.374
(func pid=6547) [5, 14000] loss: 0.308
(func pid=6547) [5, 16000] loss: 0.266
(func pid=6547) [5, 18000] loss: 0.243
(func pid=6547) [5, 20000] loss: 0.224
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:50:33. Total running time: 21min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=1.9853033395111561 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 4 249.025 1.9853 0.2559 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000004)
(func pid=6547) [6, 2000] loss: 2.298
(func pid=6547) [6, 4000] loss: 1.121
(func pid=6547) [6, 6000] loss: 0.750
(func pid=6547) [6, 8000] loss: 0.556
(func pid=6547) [6, 10000] loss: 0.457
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:51:03. Total running time: 21min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.1524145936369896 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 5 310.833 2.15241 0.1666 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [6, 12000] loss: 0.376
(func pid=6547) [6, 14000] loss: 0.323
(func pid=6547) [6, 16000] loss: 0.284
(func pid=6547) [6, 18000] loss: 0.250
(func pid=6547) [6, 20000] loss: 0.225
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:51:33. Total running time: 22min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.1524145936369896 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 5 310.833 2.15241 0.1666 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000005)
(func pid=6547) [7, 2000] loss: 2.247
(func pid=6547) [7, 4000] loss: 1.126
(func pid=6547) [7, 6000] loss: 0.747
(func pid=6547) [7, 8000] loss: 0.558
(func pid=6547) [7, 10000] loss: 0.448
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:52:03. Total running time: 22min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.252612409996986 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 6 372.979 2.25261 0.1283 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [7, 12000] loss: 0.374
(func pid=6547) [7, 14000] loss: 0.324
(func pid=6547) [7, 16000] loss: 0.277
(func pid=6547) [7, 18000] loss: 0.249
(func pid=6547) [7, 20000] loss: 0.230
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:52:33. Total running time: 23min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.252612409996986 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 6 372.979 2.25261 0.1283 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000006)
(func pid=6547) [8, 2000] loss: 2.299
(func pid=6547) [8, 4000] loss: 1.149
(func pid=6547) [8, 6000] loss: 0.765
(func pid=6547) [8, 8000] loss: 0.577
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:53:03. Total running time: 23min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.2984423138856886 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 7 434.95 2.29844 0.1099 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [8, 10000] loss: 0.461
(func pid=6547) [8, 12000] loss: 0.383
(func pid=6547) [8, 14000] loss: 0.329
(func pid=6547) [8, 16000] loss: 0.288
(func pid=6547) [8, 18000] loss: 0.255
(func pid=6547) [8, 20000] loss: 0.230
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:53:33. Total running time: 24min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.2984423138856886 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 7 434.95 2.29844 0.1099 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000007)
(func pid=6547) [9, 2000] loss: 2.334
(func pid=6547) [9, 4000] loss: 1.154
(func pid=6547) [9, 6000] loss: 0.769
(func pid=6547) [9, 8000] loss: 0.576
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:54:03. Total running time: 24min 32s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.297788763999939 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 8 496.969 2.29779 0.0998 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [9, 10000] loss: 0.461
(func pid=6547) [9, 12000] loss: 0.384
(func pid=6547) [9, 14000] loss: 0.329
(func pid=6547) [9, 16000] loss: 0.288
(func pid=6547) [9, 18000] loss: 0.256
(func pid=6547) [9, 20000] loss: 0.230
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:54:33. Total running time: 25min 2s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.297788763999939 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 8 496.969 2.29779 0.0998 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000008)
(func pid=6547) [10, 2000] loss: 2.305
(func pid=6547) [10, 4000] loss: 1.152
(func pid=6547) [10, 6000] loss: 0.768
(func pid=6547) [10, 8000] loss: 0.576
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:55:03. Total running time: 25min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.305638271808624 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 9 558.988 2.30564 0.0952 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [10, 10000] loss: 0.461
(func pid=6547) [10, 12000] loss: 0.384
(func pid=6547) [10, 14000] loss: 0.329
(func pid=6547) [10, 16000] loss: 0.288
(func pid=6547) [10, 18000] loss: 0.256
Trial status: 2 TERMINATED | 1 RUNNING | 7 PENDING
Current time: 2026-02-11 23:55:33. Total running time: 26min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.305638271808624 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00002 RUNNING 8 64 0.00169369 2 9 558.988 2.30564 0.0952 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00003 PENDING 32 4 0.000290065 4 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=6547) [10, 20000] loss: 0.231
Trial train_cifar_83ced_00002 completed after 10 iterations at 2026-02-11 23:55:43. Total running time: 26min 12s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00002 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 61.57878 │
│ time_total_s 620.56672 │
│ training_iteration 10 │
│ accuracy 0.0997 │
│ loss 2.30396 │
╰────────────────────────────────────────────────────────────╯
(func pid=6547) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00002_2_batch_size=2,l1=8,l2=64,lr=0.0017_2026-02-11_23-29-30/checkpoint_000009)
Trial train_cifar_83ced_00003 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00003 config │
├──────────────────────────────────────────────────┤
│ batch_size 4 │
│ device cuda │
│ l1 32 │
│ l2 4 │
│ lr 0.00029 │
╰──────────────────────────────────────────────────╯
(func pid=7314) [1, 2000] loss: 2.326
(func pid=7314) [1, 4000] loss: 1.155
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-11 23:56:03. Total running time: 26min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00002 with loss=2.3039642276763916 and params={'l1': 8, 'l2': 64, 'lr': 0.0016936902479707736, 'batch_size': 2, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [1, 6000] loss: 0.768
(func pid=7314) [1, 8000] loss: 0.576
(func pid=7314) [2026-02-11 23:56:14,685 E 7314 7349] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(func pid=7314) [1, 10000] loss: 0.460
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000000)
(func pid=7314) [2, 2000] loss: 2.289
(func pid=7314) [2, 4000] loss: 1.113
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-11 23:56:34. Total running time: 27min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=2.2987527786254884 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 1 33.5767 2.29875 0.0967 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [2, 6000] loss: 0.708
(func pid=7314) [2, 8000] loss: 0.512
(func pid=7314) [2, 10000] loss: 0.400
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000001)
(func pid=7314) [3, 2000] loss: 1.962
(func pid=7314) [3, 4000] loss: 0.950
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-11 23:57:04. Total running time: 27min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.9610728846549987 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 2 65.1847 1.96107 0.2186 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [3, 6000] loss: 0.608
(func pid=7314) [3, 8000] loss: 0.437
(func pid=7314) [3, 10000] loss: 0.343
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000002)
(func pid=7314) [4, 2000] loss: 1.676
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-11 23:57:34. Total running time: 28min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.6797103251218797 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 3 96.4278 1.67971 0.3521 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [4, 4000] loss: 0.823
(func pid=7314) [4, 6000] loss: 0.535
(func pid=7314) [4, 8000] loss: 0.399
(func pid=7314) [4, 10000] loss: 0.311
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000003)
(func pid=7314) [5, 2000] loss: 1.526
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-11 23:58:04. Total running time: 28min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.5487382731795312 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 4 127.956 1.54874 0.4145 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [5, 4000] loss: 0.749
(func pid=7314) [5, 6000] loss: 0.492
(func pid=7314) [5, 8000] loss: 0.363
(func pid=7314) [5, 10000] loss: 0.290
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000004)
(func pid=7314) [6, 2000] loss: 1.424
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-11 23:58:34. Total running time: 29min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.4263317683815957 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 5 159.539 1.42633 0.4775 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [6, 4000] loss: 0.689
(func pid=7314) [6, 6000] loss: 0.456
(func pid=7314) [6, 8000] loss: 0.343
(func pid=7314) [6, 10000] loss: 0.274
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000005)
(func pid=7314) [7, 2000] loss: 1.328
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-11 23:59:04. Total running time: 29min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.3459896661221982 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 6 190.771 1.34599 0.5124 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [7, 4000] loss: 0.652
(func pid=7314) [7, 6000] loss: 0.435
(func pid=7314) [7, 8000] loss: 0.328
(func pid=7314) [7, 10000] loss: 0.260
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000006)
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-11 23:59:34. Total running time: 30min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.307684950852394 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 7 222.432 1.30768 0.5262 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [8, 2000] loss: 1.266
(func pid=7314) [8, 4000] loss: 0.634
(func pid=7314) [8, 6000] loss: 0.420
(func pid=7314) [8, 8000] loss: 0.307
(func pid=7314) [8, 10000] loss: 0.250
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000007)
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-12 00:00:04. Total running time: 30min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.2625379889041186 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 8 254.048 1.26254 0.5474 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [9, 2000] loss: 1.197
(func pid=7314) [9, 4000] loss: 0.603
(func pid=7314) [9, 6000] loss: 0.404
(func pid=7314) [9, 8000] loss: 0.307
(func pid=7314) [9, 10000] loss: 0.237
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000008)
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-12 00:00:34. Total running time: 31min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.2578392902702094 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 9 285.713 1.25784 0.5534 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=7314) [10, 2000] loss: 1.144
(func pid=7314) [10, 4000] loss: 0.583
(func pid=7314) [10, 6000] loss: 0.393
(func pid=7314) [10, 8000] loss: 0.293
(func pid=7314) [10, 10000] loss: 0.238
Trial status: 3 TERMINATED | 1 RUNNING | 6 PENDING
Current time: 2026-02-12 00:01:04. Total running time: 31min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.2578392902702094 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00003 RUNNING 32 4 0.000290065 4 9 285.713 1.25784 0.5534 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00004 PENDING 256 1 0.000410329 4 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_83ced_00003 completed after 10 iterations at 2026-02-12 00:01:04. Total running time: 31min 34s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00003 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 31.3912 │
│ time_total_s 317.10417 │
│ training_iteration 10 │
│ accuracy 0.5682 │
│ loss 1.19943 │
╰────────────────────────────────────────────────────────────╯
(func pid=7314) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00003_3_batch_size=4,l1=32,l2=4,lr=0.0003_2026-02-11_23-29-30/checkpoint_000009)
Trial train_cifar_83ced_00004 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00004 config │
├──────────────────────────────────────────────────┤
│ batch_size 4 │
│ device cuda │
│ l1 256 │
│ l2 1 │
│ lr 0.00041 │
╰──────────────────────────────────────────────────╯
(func pid=8051) [1, 2000] loss: 2.341
(func pid=8051) [1, 4000] loss: 1.122
(func pid=8051) [1, 6000] loss: 0.719
(func pid=8051) [1, 8000] loss: 0.518
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:01:34. Total running time: 32min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [2026-02-12 00:01:35,723 E 8051 8086] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(func pid=8051) [1, 10000] loss: 0.400
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000000)
(func pid=8051) [2, 2000] loss: 1.970
(func pid=8051) [2, 4000] loss: 0.971
(func pid=8051) [2, 6000] loss: 0.645
(func pid=8051) [2, 8000] loss: 0.481
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:02:04. Total running time: 32min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 1 33.7152 1.96564 0.201 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [2, 10000] loss: 0.382
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000001)
(func pid=8051) [3, 2000] loss: 1.900
(func pid=8051) [3, 4000] loss: 0.944
(func pid=8051) [3, 6000] loss: 0.630
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:02:34. Total running time: 33min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 2 65.1986 1.88458 0.2085 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [3, 8000] loss: 0.468
(func pid=8051) [3, 10000] loss: 0.376
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000002)
(func pid=8051) [4, 2000] loss: 1.857
(func pid=8051) [4, 4000] loss: 0.928
(func pid=8051) [4, 6000] loss: 0.617
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:03:04. Total running time: 33min 33s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 3 96.9261 1.87799 0.2107 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [4, 8000] loss: 0.462
(func pid=8051) [4, 10000] loss: 0.370
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000003)
(func pid=8051) [5, 2000] loss: 1.839
(func pid=8051) [5, 4000] loss: 0.911
(func pid=8051) [5, 6000] loss: 0.607
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:03:34. Total running time: 34min 3s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 4 128.827 1.84098 0.2348 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [5, 8000] loss: 0.458
(func pid=8051) [5, 10000] loss: 0.362
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000004)
(func pid=8051) [6, 2000] loss: 1.807
(func pid=8051) [6, 4000] loss: 0.895
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:04:04. Total running time: 34min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 5 160.718 1.81312 0.24 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [6, 6000] loss: 0.603
(func pid=8051) [6, 8000] loss: 0.448
(func pid=8051) [6, 10000] loss: 0.362
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000005)
(func pid=8051) [7, 2000] loss: 1.776
(func pid=8051) [7, 4000] loss: 0.889
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:04:35. Total running time: 35min 4s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 6 192.62 1.78901 0.2568 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [7, 6000] loss: 0.594
(func pid=8051) [7, 8000] loss: 0.444
(func pid=8051) [7, 10000] loss: 0.354
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000006)
(func pid=8051) [8, 2000] loss: 1.743
(func pid=8051) [8, 4000] loss: 0.878
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:05:05. Total running time: 35min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 7 224.607 1.79808 0.2625 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [8, 6000] loss: 0.585
(func pid=8051) [8, 8000] loss: 0.440
(func pid=8051) [8, 10000] loss: 0.354
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000007)
(func pid=8051) [9, 2000] loss: 1.730
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:05:35. Total running time: 36min 4s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 8 256.403 1.8384 0.2597 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [9, 4000] loss: 0.859
(func pid=8051) [9, 6000] loss: 0.578
(func pid=8051) [9, 8000] loss: 0.435
(func pid=8051) [9, 10000] loss: 0.349
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000008)
(func pid=8051) [10, 2000] loss: 1.694
Trial status: 4 TERMINATED | 1 RUNNING | 5 PENDING
Current time: 2026-02-12 00:06:05. Total running time: 36min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00004 RUNNING 256 1 0.000410329 4 9 288.163 1.76916 0.2709 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00005 PENDING 16 128 0.00125962 16 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8051) [10, 4000] loss: 0.857
(func pid=8051) [10, 6000] loss: 0.576
(func pid=8051) [10, 8000] loss: 0.428
(func pid=8051) [10, 10000] loss: 0.344
Trial train_cifar_83ced_00004 completed after 10 iterations at 2026-02-12 00:06:28. Total running time: 36min 58s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00004 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 31.88895 │
│ time_total_s 320.05146 │
│ training_iteration 10 │
│ accuracy 0.2645 │
│ loss 1.88965 │
╰────────────────────────────────────────────────────────────╯
(func pid=8051) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00004_4_batch_size=4,l1=256,l2=1,lr=0.0004_2026-02-11_23-29-30/checkpoint_000009)
Trial train_cifar_83ced_00005 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00005 config │
├──────────────────────────────────────────────────┤
│ batch_size 16 │
│ device cuda │
│ l1 16 │
│ l2 128 │
│ lr 0.00126 │
╰──────────────────────────────────────────────────╯
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-02-12 00:06:35. Total running time: 37min 4s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00005 RUNNING 16 128 0.00125962 16 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8788) [1, 2000] loss: 1.873
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000000)
(func pid=8788) [2, 2000] loss: 1.460
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000001)
(func pid=8788) [3, 2000] loss: 1.329
(func pid=8788) [2026-02-12 00:06:59,747 E 8788 8823] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000002)
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-02-12 00:07:05. Total running time: 37min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00003 with loss=1.1994330612689257 and params={'l1': 32, 'l2': 4, 'lr': 0.0002900647405642859, 'batch_size': 4, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00005 RUNNING 16 128 0.00125962 16 3 28.6857 1.27627 0.5439 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8788) [4, 2000] loss: 1.229
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000003)
(func pid=8788) [5, 2000] loss: 1.180
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000004)
(func pid=8788) [6, 2000] loss: 1.128
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000005)
(func pid=8788) [7, 2000] loss: 1.103
Trial status: 5 TERMINATED | 1 RUNNING | 4 PENDING
Current time: 2026-02-12 00:07:35. Total running time: 38min 4s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.1730882787704469 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00005 RUNNING 16 128 0.00125962 16 6 55.4265 1.17309 0.5856 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000006)
(func pid=8788) [8, 2000] loss: 1.062
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000007)
(func pid=8788) [9, 2000] loss: 1.046
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000008)
(func pid=8788) [10, 2000] loss: 1.027
Trial train_cifar_83ced_00005 completed after 10 iterations at 2026-02-12 00:08:04. Total running time: 38min 33s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00005 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 8.9655 │
│ time_total_s 91.43223 │
│ training_iteration 10 │
│ accuracy 0.6138 │
│ loss 1.1101 │
╰────────────────────────────────────────────────────────────╯
(func pid=8788) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00005_5_batch_size=16,l1=16,l2=128,lr=0.0013_2026-02-11_23-29-30/checkpoint_000009)
Trial status: 6 TERMINATED | 4 PENDING
Current time: 2026-02-12 00:08:05. Total running time: 38min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 PENDING 4 64 0.000106216 8 │
│ train_cifar_83ced_00007 PENDING 1 2 0.000251581 8 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_83ced_00006 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00006 config │
├──────────────────────────────────────────────────┤
│ batch_size 8 │
│ device cuda │
│ l1 4 │
│ l2 64 │
│ lr 0.00011 │
╰──────────────────────────────────────────────────╯
(func pid=9503) [1, 2000] loss: 2.305
(func pid=9503) [1, 4000] loss: 1.145
Trial train_cifar_83ced_00006 completed after 1 iterations at 2026-02-12 00:08:27. Total running time: 38min 56s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00006 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000000 │
│ time_this_iter_s 18.24235 │
│ time_total_s 18.24235 │
│ training_iteration 1 │
│ accuracy 0.1469 │
│ loss 2.22293 │
╰────────────────────────────────────────────────────────────╯
(func pid=9503) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00006_6_batch_size=8,l1=4,l2=64,lr=0.0001_2026-02-11_23-29-30/checkpoint_000000)
Trial train_cifar_83ced_00007 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00007 config │
├──────────────────────────────────────────────────┤
│ batch_size 8 │
│ device cuda │
│ l1 1 │
│ l2 2 │
│ lr 0.00025 │
╰──────────────────────────────────────────────────╯
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-02-12 00:08:35. Total running time: 39min 4s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00007 RUNNING 1 2 0.000251581 8 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9634) [1, 2000] loss: 2.329
(func pid=9634) [1, 4000] loss: 1.125
(func pid=9634) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00007_7_batch_size=8,l1=1,l2=2,lr=0.0003_2026-02-11_23-29-30/checkpoint_000000)
(func pid=9634) [2, 2000] loss: 2.103
(func pid=9634) [2026-02-12 00:08:57,766 E 9634 9669] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(func pid=9634) [2, 4000] loss: 1.004
Trial status: 7 TERMINATED | 1 RUNNING | 2 PENDING
Current time: 2026-02-12 00:09:05. Total running time: 39min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00007 RUNNING 1 2 0.000251581 8 1 18.5527 2.15209 0.1805 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00008 PENDING 32 128 0.0536025 8 │
│ train_cifar_83ced_00009 PENDING 8 8 0.00116086 8 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Trial train_cifar_83ced_00007 completed after 2 iterations at 2026-02-12 00:09:06. Total running time: 39min 35s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00007 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000001 │
│ time_this_iter_s 16.722 │
│ time_total_s 35.27466 │
│ training_iteration 2 │
│ accuracy 0.19 │
│ loss 1.96242 │
╰────────────────────────────────────────────────────────────╯
(func pid=9634) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00007_7_batch_size=8,l1=1,l2=2,lr=0.0003_2026-02-11_23-29-30/checkpoint_000001)
Trial train_cifar_83ced_00008 started with configuration:
╭─────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00008 config │
├─────────────────────────────────────────────────┤
│ batch_size 8 │
│ device cuda │
│ l1 32 │
│ l2 128 │
│ lr 0.0536 │
╰─────────────────────────────────────────────────╯
(func pid=9831) [1, 2000] loss: 2.238
(func pid=9831) [1, 4000] loss: 1.145
Trial train_cifar_83ced_00008 completed after 1 iterations at 2026-02-12 00:09:28. Total running time: 39min 57s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00008 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000000 │
│ time_this_iter_s 18.64769 │
│ time_total_s 18.64769 │
│ training_iteration 1 │
│ accuracy 0.1125 │
│ loss 2.28543 │
╰────────────────────────────────────────────────────────────╯
(func pid=9831) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00008_8_batch_size=8,l1=32,l2=128,lr=0.0536_2026-02-11_23-29-30/checkpoint_000000)
Trial train_cifar_83ced_00009 started with configuration:
╭──────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00009 config │
├──────────────────────────────────────────────────┤
│ batch_size 8 │
│ device cuda │
│ l1 8 │
│ l2 8 │
│ lr 0.00116 │
╰──────────────────────────────────────────────────╯
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2026-02-12 00:09:35. Total running time: 40min 4s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00009 RUNNING 8 8 0.00116086 8 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00007 TERMINATED 1 2 0.000251581 8 2 35.2747 1.96242 0.19 │
│ train_cifar_83ced_00008 TERMINATED 32 128 0.0536025 8 1 18.6477 2.28543 0.1125 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9962) [1, 2000] loss: 2.123
(func pid=9962) [1, 4000] loss: 0.876
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000000)
(func pid=9962) [2, 2000] loss: 1.594
(func pid=9962) [2026-02-12 00:09:59,771 E 9962 9997] core_worker_process.cc:837: Failed to establish connection to the metrics exporter agent. Metrics will not be exported. Exporter agent status: RpcError: Running out of retries to initialize the metrics agent. rpc_code: 14
(func pid=9962) [2, 4000] loss: 0.772
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2026-02-12 00:10:05. Total running time: 40min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00009 RUNNING 8 8 0.00116086 8 1 18.645 1.59356 0.3914 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00007 TERMINATED 1 2 0.000251581 8 2 35.2747 1.96242 0.19 │
│ train_cifar_83ced_00008 TERMINATED 32 128 0.0536025 8 1 18.6477 2.28543 0.1125 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000001)
(func pid=9962) [3, 2000] loss: 1.442
(func pid=9962) [3, 4000] loss: 0.704
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000002)
(func pid=9962) [4, 2000] loss: 1.350
(func pid=9962) [4, 4000] loss: 0.670
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2026-02-12 00:10:35. Total running time: 41min 4s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00009 RUNNING 8 8 0.00116086 8 3 51.1745 1.35135 0.51 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00007 TERMINATED 1 2 0.000251581 8 2 35.2747 1.96242 0.19 │
│ train_cifar_83ced_00008 TERMINATED 32 128 0.0536025 8 1 18.6477 2.28543 0.1125 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000003)
(func pid=9962) [5, 2000] loss: 1.291
(func pid=9962) [5, 4000] loss: 0.639
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000004)
(func pid=9962) [6, 2000] loss: 1.245
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2026-02-12 00:11:05. Total running time: 41min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00009 RUNNING 8 8 0.00116086 8 5 83.932 1.29291 0.5312 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00007 TERMINATED 1 2 0.000251581 8 2 35.2747 1.96242 0.19 │
│ train_cifar_83ced_00008 TERMINATED 32 128 0.0536025 8 1 18.6477 2.28543 0.1125 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9962) [6, 4000] loss: 0.623
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000005)
(func pid=9962) [7, 2000] loss: 1.214
(func pid=9962) [7, 4000] loss: 0.606
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000006)
(func pid=9962) [8, 2000] loss: 1.183
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2026-02-12 00:11:35. Total running time: 42min 4s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00009 RUNNING 8 8 0.00116086 8 7 116.654 1.28741 0.5368 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00007 TERMINATED 1 2 0.000251581 8 2 35.2747 1.96242 0.19 │
│ train_cifar_83ced_00008 TERMINATED 32 128 0.0536025 8 1 18.6477 2.28543 0.1125 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9962) [8, 4000] loss: 0.593
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000007)
(func pid=9962) [9, 2000] loss: 1.151
(func pid=9962) [9, 4000] loss: 0.585
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000008)
Trial status: 9 TERMINATED | 1 RUNNING
Current time: 2026-02-12 00:12:05. Total running time: 42min 34s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00009 RUNNING 8 8 0.00116086 8 9 149.494 1.182 0.581 │
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00007 TERMINATED 1 2 0.000251581 8 2 35.2747 1.96242 0.19 │
│ train_cifar_83ced_00008 TERMINATED 32 128 0.0536025 8 1 18.6477 2.28543 0.1125 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
(func pid=9962) [10, 2000] loss: 1.129
(func pid=9962) [10, 4000] loss: 0.570
Trial train_cifar_83ced_00009 completed after 10 iterations at 2026-02-12 00:12:18. Total running time: 42min 47s
╭────────────────────────────────────────────────────────────╮
│ Trial train_cifar_83ced_00009 result │
├────────────────────────────────────────────────────────────┤
│ checkpoint_dir_name checkpoint_000009 │
│ time_this_iter_s 16.3993 │
│ time_total_s 165.8929 │
│ training_iteration 10 │
│ accuracy 0.5452 │
│ loss 1.27621 │
╰────────────────────────────────────────────────────────────╯
(func pid=9962) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30/train_cifar_83ced_00009_9_batch_size=8,l1=8,l2=8,lr=0.0012_2026-02-11_23-29-30/checkpoint_000009)
2026-02-12 00:12:18,836 INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/var/lib/ci-user/ray_results/train_cifar_2026-02-11_23-29-30' in 0.0106s.
Trial status: 10 TERMINATED
Current time: 2026-02-12 00:12:18. Total running time: 42min 48s
Logical resource usage: 2.0/16 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:A10G)
Current best trial: 83ced_00005 with loss=1.110095292186737 and params={'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name status l1 l2 lr batch_size iter total time (s) loss accuracy │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ train_cifar_83ced_00000 TERMINATED 1 32 0.0358363 2 10 622.825 2.31272 0.1054 │
│ train_cifar_83ced_00001 TERMINATED 16 8 0.00570343 4 10 317.029 2.306 0.0984 │
│ train_cifar_83ced_00002 TERMINATED 8 64 0.00169369 2 10 620.567 2.30396 0.0997 │
│ train_cifar_83ced_00003 TERMINATED 32 4 0.000290065 4 10 317.104 1.19943 0.5682 │
│ train_cifar_83ced_00004 TERMINATED 256 1 0.000410329 4 10 320.051 1.88965 0.2645 │
│ train_cifar_83ced_00005 TERMINATED 16 128 0.00125962 16 10 91.4322 1.1101 0.6138 │
│ train_cifar_83ced_00006 TERMINATED 4 64 0.000106216 8 1 18.2423 2.22293 0.1469 │
│ train_cifar_83ced_00007 TERMINATED 1 2 0.000251581 8 2 35.2747 1.96242 0.19 │
│ train_cifar_83ced_00008 TERMINATED 32 128 0.0536025 8 1 18.6477 2.28543 0.1125 │
│ train_cifar_83ced_00009 TERMINATED 8 8 0.00116086 8 10 165.893 1.27621 0.5452 │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Best trial config: {'l1': 16, 'l2': 128, 'lr': 0.0012596197781973224, 'batch_size': 16, 'device': 'cuda'}
Best trial final validation loss: 1.110095292186737
Best trial final validation accuracy: 0.6138
Best trial test set accuracy: 0.6093
Results#
Your Ray Tune trial summary output looks something like this. The text table summarizes the validation performance of the trials and highlights the best hyperparameter configuration:
Number of trials: 10/10 (10 TERMINATED)
+-----+--------------+------+------+-------------+--------+---------+------------+
| ... | batch_size | l1 | l2 | lr | iter | loss | accuracy |
|-----+--------------+------+------+-------------+--------+---------+------------|
| ... | 2 | 1 | 256 | 0.000668163 | 1 | 2.31479 | 0.0977 |
| ... | 4 | 64 | 8 | 0.0331514 | 1 | 2.31605 | 0.0983 |
| ... | 4 | 2 | 1 | 0.000150295 | 1 | 2.30755 | 0.1023 |
| ... | 16 | 32 | 32 | 0.0128248 | 10 | 1.66912 | 0.4391 |
| ... | 4 | 8 | 128 | 0.00464561 | 2 | 1.7316 | 0.3463 |
| ... | 8 | 256 | 8 | 0.00031556 | 1 | 2.19409 | 0.1736 |
| ... | 4 | 16 | 256 | 0.00574329 | 2 | 1.85679 | 0.3368 |
| ... | 8 | 2 | 2 | 0.00325652 | 1 | 2.30272 | 0.0984 |
| ... | 2 | 2 | 2 | 0.000342987 | 2 | 1.76044 | 0.292 |
| ... | 4 | 64 | 32 | 0.003734 | 8 | 1.53101 | 0.4761 |
+-----+--------------+------+------+-------------+--------+---------+------------+
Best trial config: {'l1': 64, 'l2': 32, 'lr': 0.0037339984519545164, 'batch_size': 4}
Best trial final validation loss: 1.5310075663924216
Best trial final validation accuracy: 0.4761
Best trial test set accuracy: 0.4737
Most trials stopped early to conserve resources. The best performing trial achieved a validation accuracy of approximately 47%, which the test set confirms.
Observability#
Monitoring is critical when running large-scale experiments. Ray provides a dashboard that lets you view the status of your trials, check cluster resource use, and inspect logs in real time.
For debugging, Ray also offers distributed debugging tools that let you attach a debugger to running trials across the cluster.
Conclusion#
In this tutorial, you learned how to tune the hyperparameters of a
PyTorch model using Ray Tune. You saw how to integrate Ray Tune into
your PyTorch training loop, define a search space for your
hyperparameters, use an efficient scheduler like ASHAScheduler to
terminate low-performing trials early, save checkpoints and report
metrics to Ray Tune, and run the hyperparameter search and analyze the
results.
Ray Tune makes it straightforward to scale your experiments from a single machine to a large cluster, helping you find the best model configuration efficiently.
Further reading#
Total running time of the script: (43 minutes 1.740 seconds)