# Experiments

Track your machine learning experiments with detailed logging and comparison tools.

![Experiments View](/files/jewgMIgHwtRSCunbmmPK)

## Creating an Experiment

Navigate to **Deep Learning Platform** → **Experiments** → Click **Create**

![Create Experiment Form](/files/Ce99i8TqgR4FL2Ai5P0W)

### Basic Information

**Experiment Name**\* (Required)

* Enter a descriptive name for the experiment
* Example: `image-classification-resnet`, `nlp-sentiment-bert`

**Description** (Optional)

* Detailed description of experiment purpose and goals

**Framework**\* (Required)

* Select your ML framework from dropdown:
  * PyTorch
  * TensorFlow
  * Scikit-learn
  * Keras
  * Others
* Default: `pytorch`

**Task Type**\* (Required)

* Select the ML task type:
  * Classification
  * Regression
  * Detection
  * Segmentation
  * Others
* Default: `classification`

**Model Type**\* (Required)

* Specify the model architecture
* Examples: ResNet, BERT, YOLO, Custom

**Project ID** (Optional)

* Link experiment to a specific project

### Training Configuration

**Epochs**\* (Required)

* Number of training epochs
* Example: `100`

**Number of training epochs** (Helper text)

* Additional context for epochs setting

**Staging batch size** (Optional)

* Batch size for staging/validation

**Learning Rate**\* (Required)

* Initial learning rate for training
* Example: `0.001`
* Helper text: "Valid learning rate"

**Loss Function**\* (Required)

* Select optimizer from dropdown:
  * Adam
  * SGD
  * RMSprop
  * Others
* Default: `adam`

**Loss Function**\* (Required)

* Select loss function:
  * Categorical Crossentropy
  * Binary Crossentropy
  * MSE
  * Others
* Default: `categorical_crossentropy`

### Environment & Resources

**Python Version**\* (Required)

* Select Python version:
  * Python 3.9
  * Python 3.10
  * Python 3.11
* Default: `python`

**GPU Required** (Checkbox)

* Check if GPU is required for training

**Memory Requirement (GB)**\* (Required)

* Required memory in GB
* Example: `8`, `16`, `32`

**Required memory in GB** (Helper text)

**CPU Cores**\* (Required)

* Number of CPU cores needed
* Example: `4`, `8`, `16`

**Number of CPU cores** (Helper text)

### Metadata

**Tags** (Optional)

* Comma-separated tags for organizing experiments
* Example: `computer-vision, production, baseline`

**Notes** (Optional)

* Additional notes or comments about the experiment

**Public Experiment** (Checkbox)

* Make experiment visible to all organization members

### Actions

* **Cancel**: Discard and close the form
* **Create Experiment**: Submit and create the experiment

## Example Configuration

```yaml
Experiment Name: resnet50-imagenet-baseline
Description: Baseline training of ResNet50 on ImageNet dataset
Framework: PyTorch
Task Type: Classification
Model Type: ResNet50
Epochs: 90
Learning Rate: 0.1
Loss Function: Adam
Loss Function: Categorical Crossentropy
Python Version: Python 3.9
GPU Required: ✓
Memory Requirement: 32 GB
CPU Cores: 16
Tags: computer-vision, classification, baseline
Public Experiment: ✓
```

## Viewing Experiment Details

To view detailed information about an experiment:

1. Navigate to **Deep Learning Platform** → **Experiments**
2. Click on an experiment from the list
3. View comprehensive details in the modal dialog

![View Experiment Details](/files/z9O8vZ0K4Y2Q7LQGxYzU)

**Details Panel Sections**:

* **Basic Information**:
  * Experiment Name: e.g., "Image Classification CNN"
  * Description: Full description of the experiment
  * Framework: TensorFlow, PyTorch, etc.
  * Task Type: Classification, Regression, etc.
  * Model Type: CNN, ResNet, Custom, etc.
  * Project ID: Associated project
* **Training Configuration**:
  * Epochs: Number of training epochs (e.g., 100)
  * Batch Size: Training batch size (e.g., 32)
  * Learning Rate: Initial learning rate (e.g., 0.001)
  * Optimizer: Adam, SGD, etc.
  * Loss Function: Categorical Crossentropy, MSE, etc.
* **Environment & Resources**:
  * Python Version: e.g., Python 3.9
  * GPU Required: Checkbox status
  * Memory Requirement (GB): e.g., 8 GB
  * CPU Cores: e.g., 4 cores
* **Metadata**:
  * Tags: Comma-separated tags
  * Notes: Additional notes
  * Public Experiment: Visibility status
  * Creator and timestamps

## Editing an Experiment

To modify an experiment configuration:

1. Navigate to the experiment details page
2. Click **Edit** button (or three-dot menu → Edit)
3. Modify editable fields in the Edit Experiment modal

![Edit Experiment Form](/files/Gw6JXnxISBJmcrruBTZr)

4. Click **Update Experiment** to save changes

> \[!NOTE] The Edit form looks very similar to the View form, but fields become editable and you'll see an "Update Experiment" button instead of just "Cancel".

> \[!NOTE] You cannot edit core configuration (framework, resources, hyperparameters) of a running or completed experiment. To try different settings, clone the experiment instead.

**Editable Fields**:

* ✅ Description
* ✅ Tags
* ✅ Notes
* ✅ Public/Private status
* ❌ Framework (cannot edit)
* ❌ Resources (cannot edit while running)
* ❌ Hyperparameters (cannot edit)

## Cloning an Experiment

To create a copy of an experiment with modified settings:

1. Open experiment details
2. Click **Clone** button
3. Modify configuration as needed
4. Give it a new name
5. Click **Create Experiment**

**Use Cases**:

* Try different hyperparameters
* Run with more/less resources
* Test on different datasets
* Reproduce results

## Deleting an Experiment

To remove an experiment:

1. Navigate to experiment details or list
2. Click **Delete** button (trash icon)
3. Confirm deletion in the dialog
4. Experiment and associated data will be removed

> \[!WARNING] Deleting an experiment will permanently remove:
>
> * Experiment configuration
> * Training logs
> * Metrics and charts
> * Saved checkpoints (unless linked to a registered model)
> * This action cannot be undone!

**Before Deleting**:

* Export important logs or metrics
* Register any valuable models
* Download artifacts if needed
* Verify you have the correct experiment selected

## Monitoring Experiments

Once submitted, track your experiment:

**Real-time Monitoring**

* View live logs
* Monitor resource utilization (CPU, GPU, memory)
* Track metrics as they're logged
* Receive alerts on failures

**Experiment Status**

* **Pending**: Waiting for resources
* **Running**: Currently executing
* **Completed**: Finished successfully
* **Failed**: Encountered an error
* **Stopped**: Manually stopped
* **Cancelled**: Cancelled before starting

**Actions Available**

* **View Logs**: See stdout/stderr
* **View Metrics**: Charts and graphs
* **Stop**: Terminate running experiment
* **Clone**: Create a copy with same config
* **Compare**: Compare with other experiments
* **Export**: Download results and artifacts

## Comparing Experiments

Compare multiple experiments side-by-side:

1. **Select Experiments**: Check boxes for 2+ experiments
2. **Click Compare**: Opens comparison view
3. **View Differences**:
   * Hyperparameters table
   * Metrics charts (overlaid)
   * Resource usage comparison
   * Final results summary

## Best Practices

**Naming Conventions**

```
{model}-{dataset}-{variant}-{version}
Examples:
- resnet50-imagenet-baseline-v1
- bert-squad-finetuned-v2
```

**Tagging Strategy**

* **Domain**: `computer-vision`, `nlp`, `audio`
* **Task**: `classification`, `detection`, `segmentation`
* **Stage**: `exploration`, `tuning`, `production`

**Resource Optimization**

* Start with minimal resources, scale up as needed
* Use GPU only when necessary
* Monitor resource utilization

## Next Steps

* Register your trained model in [Models](/kaisar-network/kaisar-ai-ops/deep-learning-platform/models.md)
* Deploy to production via [Deployments](/kaisar-network/kaisar-ai-ops/deep-learning-platform/deployments.md)
* Monitor performance in [Analytics](/kaisar-network/kaisar-ai-ops/deep-learning-platform/analytics.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.kaisar.io/kaisar-network/kaisar-ai-ops/deep-learning-platform/experiments.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
