Deployments

Deploy your models to production with auto-scaling and load balancing.

Creating a Deployment

Navigate to Deployments → Click Create

Basic Information

Deployment Name* (Required)

Enter a descriptive name for the deployment
Example: resnet-prod, bert-api-v1

Description (Optional)

Deployment purpose and details

Model ID* (Required)

ID of the model to deploy
Helper text: "ID of the model to deploy"

Model Version* (Required)

Version of the model to deploy
Helper text: "Version of the model to deploy"

Environment* (Required)

Select deployment environment:
- Development
- Staging
- Production
Default: development

Resource Configuration

CPU Cores* (Required)

Number of CPU cores per instance
Example: 4, 8, 16

Memory (GB)* (Required)

Memory allocation per instance in GB
Example: 8, 16, 32

GPU Count (Optional)

Number of GPUs (0 or GPU Count)
Default: 0

GPU Type (Optional)

Select GPU type if GPU Count > 0:
- NVIDIA T4
- NVIDIA V100
- NVIDIA A100

Min Replicas* (Required)

Minimum number of instances
Example: 1, 2

Max Replicas* (Required)

Maximum number of instances
Example: 10, 20

Target CPU Utilization (%)* (Required)

CPU threshold for scaling
Example: 70, 80

Target Memory Utilization (%)* (Required)

Memory threshold for scaling
Example: 80, 90

Scaling Configuration

Enable Auto-Scaling (Checkbox)

Enable automatic scaling based on metrics

When enabled:

Min Instances*: Minimum instances to maintain
Max Instances*: Maximum instances allowed
Target GPU Utilization (%): GPU threshold
Target Memory Utilization (%): Memory threshold

Load Balancer

Enable Load Balancer (Checkbox)

Enable load balancing across instances

When enabled:

Service Type*: Round Robin, Least Connections, IP Hash
Health Check URL*: Endpoint for health checks (e.g., /health)
Health Check Interval (seconds)*: Frequency of health checks
Sticky Sessions: Enable session affinity

Actions

Cancel: Discard and close
Create Deployment: Submit and create the deployment

Example Configuration

Deployment Name: resnet-production-v1
Model ID: model_abc123
Model Version: 1.0.0
Environment: Development

Resource Configuration:
  CPU Cores: 4
  Memory (GB): 16
  GPU Count: 0
  Min Replicas: 2
  Max Replicas: 10

Scaling:
  Enable Auto-Scaling: ✓
  Min Instances: 2
  Max Instances: 10
  Target CPU: 70%

Load Balancer:
  Enable: ✓
  Service Type: Round Robin
  Health Check URL: /health
  Health Check Interval: 30s
  Sticky Sessions: ✓

Viewing Deployment Details

To view detailed information about a deployment:

Navigate to Deployments
Click on a deployment from the list
View comprehensive details in the modal dialog

Details Panel Sections:

Basic Information:

Deployment Name: e.g., "BERT Sentiment API"
Description: Full description of the deployment
Model ID: ID of the deployed model (e.g., "model-001")
Model Version: Version being deployed (e.g., "v2.1.0")
Environment: Production, Staging, or Development
Deployment Strategy: Blue-Green, Rolling Update, Canary

Resource Configuration:

CPU Cores: Number of CPU cores per instance (e.g., 4)
Memory (GB): Memory allocation (e.g., 16 GB)
GPU Count: Number of GPUs (e.g., 2)
GPU Type: GPU model (e.g., A100, V100)
Storage (GB): Storage allocation (e.g., 200 GB)
Network Bandwidth (Mbps): Network bandwidth (e.g., 1000 Mbps)

Scaling Configuration:

Current Instances: Number of running instances (e.g., 2)
Enable Auto-Scaling: Checkbox status
Min Instances: Minimum replicas (e.g., 2)
Max Instances: Maximum replicas (e.g., 10)
Target CPU Utilization (%): CPU scaling threshold (e.g., 70%)
Target GPU usage for trigger scaling: GPU threshold
Target Memory Utilization (%): Memory threshold (e.g., 80%)
Target memory usage for trigger scaling: Memory threshold

Load Balancer:

Enable Load Balancer: Checkbox status
Load Balancing Algorithm: Round Robin, Least Connections, IP Hash
Health Check Path: Endpoint for health checks (e.g., "/health")
Health Check Interval (seconds): Check frequency (e.g., 30)
Sticky Sessions: Checkbox for session affinity

Editing a Deployment

To update deployment configuration:

Open deployment details page
Click Edit button (or three-dot menu → Edit)
Modify editable fields in the Edit Deployment modal

Click Update Deployment to save changes

[!NOTE] The Edit form is identical to the View form, but with editable fields and an "Update Deployment" button. Some changes may require a deployment restart.

[!NOTE] Some changes may require a deployment restart to take effect.

Editable Fields:

✅ Description
✅ Environment variables
✅ Min/Max replicas
✅ Auto-scaling thresholds
✅ Health check settings
✅ Load balancer configuration
⚠️ CPU/Memory (requires restart)
❌ Model ID (use Update Model instead)
❌ Deployment name (cannot edit)

Updating Model Version

To deploy a new model version:

Open deployment details
Click Update Model button
Select new model version
Choose update strategy:
- Rolling Update: Gradual replacement (zero downtime)
- Blue-Green: Switch all at once
- Canary: Test with small percentage first
Click Update

Update Strategies:

Rolling Update (Recommended):

Gradually replaces old instances
Zero downtime
Automatic rollback on failure

Blue-Green:

Deploys new version alongside old
Switches traffic all at once
Quick rollback possible

Canary:

Routes small % of traffic to new version
Monitor performance
Gradually increase if successful

Scaling a Deployment

Manual Scaling:

Open deployment details
Click Scale button
Adjust number of replicas
Click Apply

Auto-scaling:

Open deployment details
Click Edit
Enable auto-scaling
Set min/max replicas
Configure scaling triggers
Save changes

Stopping a Deployment

To temporarily stop a deployment:

Open deployment details
Click Stop button
Confirm action
All instances will shut down
Endpoint will become unavailable

Use Cases:

Maintenance window
Cost optimization
Testing in isolation

Restarting a Deployment

To restart a stopped deployment:

Open deployment details
Click Start button
Deployment will resume with previous configuration

Deleting a Deployment

To permanently remove a deployment:

Navigate to deployment details
Click Delete button
Confirm deletion

[!WARNING] Deleting a deployment will:
Shut down all instances
Remove the endpoint
Delete deployment configuration
This action cannot be undone!

Before Deleting:

Stop sending traffic to the endpoint
Update client applications
Export logs if needed
Verify you have the correct deployment

Monitoring Deployments

Real-time Metrics:

Request rate
Latency (p50, p95, p99)
Error rate
Resource usage

Actions:

Scale up/down
Update model version
View logs
Rollback

Best Practices

Set appropriate min/max replicas
Configure auto-scaling thresholds
Enable health checks
Use load balancing for high traffic
Monitor performance continuously

Next Steps

Monitor in Analytics
View logs and metrics
Set up alerts

PreviousJobs NextAnalytics

Last updated 4 months ago

hashtagCreating a Deployment

hashtagBasic Information

hashtagResource Configuration

hashtagScaling Configuration

hashtagLoad Balancer

hashtagActions

hashtagExample Configuration

hashtagViewing Deployment Details

hashtagEditing a Deployment

hashtagUpdating Model Version

hashtagScaling a Deployment

hashtagStopping a Deployment

hashtagRestarting a Deployment

hashtagDeleting a Deployment

hashtagMonitoring Deployments

hashtagBest Practices

hashtagNext Steps