Deployments
Deploy your models to production with auto-scaling and load balancing.

Creating a Deployment
Navigate to Deployments → Click Create

Basic Information
Deployment Name* (Required)
Enter a descriptive name for the deployment
Example:
resnet-prod,bert-api-v1
Description (Optional)
Deployment purpose and details
Model ID* (Required)
ID of the model to deploy
Helper text: "ID of the model to deploy"
Model Version* (Required)
Version of the model to deploy
Helper text: "Version of the model to deploy"
Environment* (Required)
Select deployment environment:
Development
Staging
Production
Default:
development
Resource Configuration
CPU Cores* (Required)
Number of CPU cores per instance
Example:
4,8,16
Memory (GB)* (Required)
Memory allocation per instance in GB
Example:
8,16,32
GPU Count (Optional)
Number of GPUs (0 or GPU Count)
Default:
0
GPU Type (Optional)
Select GPU type if GPU Count > 0:
NVIDIA T4
NVIDIA V100
NVIDIA A100
Min Replicas* (Required)
Minimum number of instances
Example:
1,2
Max Replicas* (Required)
Maximum number of instances
Example:
10,20
Target CPU Utilization (%)* (Required)
CPU threshold for scaling
Example:
70,80
Target Memory Utilization (%)* (Required)
Memory threshold for scaling
Example:
80,90
Scaling Configuration
Enable Auto-Scaling (Checkbox)
Enable automatic scaling based on metrics
When enabled:
Min Instances*: Minimum instances to maintain
Max Instances*: Maximum instances allowed
Target GPU Utilization (%): GPU threshold
Target Memory Utilization (%): Memory threshold
Load Balancer
Enable Load Balancer (Checkbox)
Enable load balancing across instances
When enabled:
Service Type*: Round Robin, Least Connections, IP Hash
Health Check URL*: Endpoint for health checks (e.g.,
/health)Health Check Interval (seconds)*: Frequency of health checks
Sticky Sessions: Enable session affinity
Actions
Cancel: Discard and close
Create Deployment: Submit and create the deployment
Example Configuration
Viewing Deployment Details
To view detailed information about a deployment:
Navigate to Deployments
Click on a deployment from the list
View comprehensive details in the modal dialog

Details Panel Sections:
Basic Information:
Deployment Name: e.g., "BERT Sentiment API"
Description: Full description of the deployment
Model ID: ID of the deployed model (e.g., "model-001")
Model Version: Version being deployed (e.g., "v2.1.0")
Environment: Production, Staging, or Development
Deployment Strategy: Blue-Green, Rolling Update, Canary
Resource Configuration:
CPU Cores: Number of CPU cores per instance (e.g., 4)
Memory (GB): Memory allocation (e.g., 16 GB)
GPU Count: Number of GPUs (e.g., 2)
GPU Type: GPU model (e.g., A100, V100)
Storage (GB): Storage allocation (e.g., 200 GB)
Network Bandwidth (Mbps): Network bandwidth (e.g., 1000 Mbps)
Scaling Configuration:
Current Instances: Number of running instances (e.g., 2)
Enable Auto-Scaling: Checkbox status
Min Instances: Minimum replicas (e.g., 2)
Max Instances: Maximum replicas (e.g., 10)
Target CPU Utilization (%): CPU scaling threshold (e.g., 70%)
Target GPU usage for trigger scaling: GPU threshold
Target Memory Utilization (%): Memory threshold (e.g., 80%)
Target memory usage for trigger scaling: Memory threshold
Load Balancer:
Enable Load Balancer: Checkbox status
Load Balancing Algorithm: Round Robin, Least Connections, IP Hash
Health Check Path: Endpoint for health checks (e.g., "/health")
Health Check Interval (seconds): Check frequency (e.g., 30)
Sticky Sessions: Checkbox for session affinity
Editing a Deployment
To update deployment configuration:
Open deployment details page
Click Edit button (or three-dot menu → Edit)
Modify editable fields in the Edit Deployment modal

Click Update Deployment to save changes
[!NOTE] The Edit form is identical to the View form, but with editable fields and an "Update Deployment" button. Some changes may require a deployment restart.
[!NOTE] Some changes may require a deployment restart to take effect.
Editable Fields:
✅ Description
✅ Environment variables
✅ Min/Max replicas
✅ Auto-scaling thresholds
✅ Health check settings
✅ Load balancer configuration
⚠️ CPU/Memory (requires restart)
❌ Model ID (use Update Model instead)
❌ Deployment name (cannot edit)
Updating Model Version
To deploy a new model version:
Open deployment details
Click Update Model button
Select new model version
Choose update strategy:
Rolling Update: Gradual replacement (zero downtime)
Blue-Green: Switch all at once
Canary: Test with small percentage first
Click Update
Update Strategies:
Rolling Update (Recommended):
Gradually replaces old instances
Zero downtime
Automatic rollback on failure
Blue-Green:
Deploys new version alongside old
Switches traffic all at once
Quick rollback possible
Canary:
Routes small % of traffic to new version
Monitor performance
Gradually increase if successful
Scaling a Deployment
Manual Scaling:
Open deployment details
Click Scale button
Adjust number of replicas
Click Apply
Auto-scaling:
Open deployment details
Click Edit
Enable auto-scaling
Set min/max replicas
Configure scaling triggers
Save changes
Stopping a Deployment
To temporarily stop a deployment:
Open deployment details
Click Stop button
Confirm action
All instances will shut down
Endpoint will become unavailable
Use Cases:
Maintenance window
Cost optimization
Testing in isolation
Restarting a Deployment
To restart a stopped deployment:
Open deployment details
Click Start button
Deployment will resume with previous configuration
Deleting a Deployment
To permanently remove a deployment:
Navigate to deployment details
Click Delete button
Confirm deletion
[!WARNING] Deleting a deployment will:
Shut down all instances
Remove the endpoint
Delete deployment configuration
This action cannot be undone!
Before Deleting:
Stop sending traffic to the endpoint
Update client applications
Export logs if needed
Verify you have the correct deployment
Monitoring Deployments
Real-time Metrics:
Request rate
Latency (p50, p95, p99)
Error rate
Resource usage
Actions:
Scale up/down
Update model version
View logs
Rollback
Best Practices
Set appropriate min/max replicas
Configure auto-scaling thresholds
Enable health checks
Use load balancing for high traffic
Monitor performance continuously
Next Steps
Monitor in Analytics
View logs and metrics
Set up alerts
Last updated