Troubleshooting
Common issues and solutions for Kaisar AI Ops.
Overview
This section covers:
FAQ - Frequently asked questions
Known Issues - Current limitations
Support - How to get help
Quick Troubleshooting
Cannot Log In
Symptoms: Login page shows error or redirects back
Solutions:
Clear browser cache and cookies
Try incognito/private mode
Verify credentials with admin
Check if MFA is required
Try password reset
Experiment Won't Start
Symptoms: Experiment stuck in "pending" status
Solutions:
Check resource quotas
Verify compute resources are available
Review experiment configuration
Check cluster capacity
View experiment logs for errors
Slow Dashboard Loading
Symptoms: Dashboard takes long to load
Solutions:
Check internet connection
Clear browser cache
Reduce number of displayed items
Check system status page
Try different browser
API Requests Failing
Symptoms: 401, 403, or 500 errors
Solutions:
Verify API token is valid
Check token permissions
Review rate limits
Check API endpoint URL
Verify request format
Common Error Messages
"Quota Exceeded"
Cause: Resource limit reached
Solution:
Check current usage
Clean up unused resources
Request quota increase
Optimize resource allocation
"Permission Denied"
Cause: Insufficient permissions
Solution:
Check your role
Request access from admin
Verify resource sharing settings
Check organization membership
"Resource Not Found"
Cause: Invalid ID or deleted resource
Solution:
Verify resource ID
Check if resource was deleted
Ensure you have access
Try listing resources first
Performance Issues
Slow Experiment Training
Possible Causes:
Inefficient data loading
Suboptimal batch size
CPU bottleneck
Network I/O issues
Solutions:
Profile your code
Optimize data pipeline
Increase batch size
Use data caching
Check GPU utilization
High Memory Usage
Possible Causes:
Large batch size
Memory leaks
Inefficient model architecture
Solutions:
Reduce batch size
Use gradient accumulation
Enable mixed precision training
Profile memory usage
Clear unused variables
Integration Issues
Authentication Service Failing
Solutions:
Verify Authentication Service is running
Check client configuration
Review realm settings
Verify redirect URIs
Check SSL certificates
Storage Connection Failed
Solutions:
Verify credentials
Check bucket/container exists
Review IAM permissions
Test network connectivity
Verify endpoint URL
Getting Help
Self-Service Resources
FAQ - Common questions
User Guide - Feature documentation
API Reference - API documentation
Contact Support
Support Portal - Submit tickets
Email: [email protected]
Slack: #kaisar-support
Community
GitHub Discussions
Stack Overflow (tag: kaisar-ai-ops)
Community Forum
Diagnostic Tools
Health Check
Check system health:
API Verification
Verify API access:
Network Test
Test connectivity:
Best Practices
✅ Check system status before reporting issues
✅ Collect error messages and logs
✅ Try basic troubleshooting first
✅ Document steps to reproduce
✅ Include relevant screenshots
✅ Provide system information
Next Steps
Review FAQ for common questions
Check Known Issues
Contact Support if needed
Last updated