L

Initializing Studio...

Documentation

Getting Started

  • Introduction
  • Quick Start
  • Installation

Fine-tuning

  • LoRA & QLoRA
  • Full Fine-tuning

API & SDK

  • REST API
  • Python SDK

Deployment

  • Cloud Deployment
  • Security

Resources

  • FAQ
  • Changelog

Cloud Deployment

Deploy your fine-tuned models to production with auto-scaling, monitoring, and CI/CD integration.

Quick Deploy

Deploy your model to production in minutes:

One-Click Deploy: Simple deployment from the dashboard

Custom Domains: Use your own domain with SSL certificates
python
1# Deploy via CLI
2langtrain deploy create \
3 --model my-fine-tuned-model \
4 --name production-api \
5 --region us-east-1 \
6 --min-instances 1 \
7 --max-instances 10
8
9# Deploy via Python SDK
10deployment = client.deployments.create(
11 model_id="your-model-id",
12 name="production-api",
13 config={
14 "region": "us-east-1",
15 "instance_type": "gpu-medium",
16 "min_instances": 1,
17 "max_instances": 10,
18 "auto_scaling": True
19 }
20)

Container Deployment

Deploy using Docker containers for maximum flexibility:

Custom Images: Bring your own Docker images

Kubernetes: Native Kubernetes support with Helm charts
python
1# Generate Dockerfile
2langtrain deploy generate-dockerfile --model my-model
3
4# Build and deploy
5docker build -t my-model:latest .
6docker push your-registry/my-model:latest
7
8# Kubernetes deployment
9apiVersion: apps/v1
10kind: Deployment
11metadata:
12 name: langtrain-model
13spec:
14 replicas: 3
15 selector:
16 matchLabels:
17 app: langtrain-model
18 template:
19 metadata:
20 labels:
21 app: langtrain-model
22 spec:
23 containers:
24 - name: model
25 image: your-registry/my-model:latest
26 ports:
27 - containerPort: 8000

Load Balancing

Distribute traffic across multiple instances:

Health Checks: Automatic health monitoring and failover

Traffic Routing: Smart routing based on model performance
python
1# Configure load balancer
2deployment_config = {
3 "load_balancer": {
4 "algorithm": "round_robin",
5 "health_check": {
6 "path": "/health",
7 "interval": 30,
8 "timeout": 5,
9 "healthy_threshold": 2,
10 "unhealthy_threshold": 3
11 },
12 "sticky_sessions": False
13 },
14 "auto_scaling": {
15 "metric": "requests_per_second",
16 "target": 100,
17 "scale_up_cooldown": 300,
18 "scale_down_cooldown": 600
19 }
20}

Monitoring & Alerts

Monitor your deployed models in real-time:

Metrics: Request latency, throughput, error rates, and custom metrics

Alerts: Configure alerts for critical issues
python
1# Set up monitoring
2client.monitoring.create_alert(
3 deployment_id="your-deployment-id",
4 metric="response_time_p95",
5 threshold=2000, # 2 seconds
6 comparison="greater_than",
7 notification_channels=["email", "slack"]
8)
9
10# Custom metrics
11client.monitoring.track_metric(
12 deployment_id="your-deployment-id",
13 metric_name="business_metric",
14 value=42,
15 tags={"version": "v1.2", "region": "us-east-1"}
16)

CI/CD Integration

Integrate with your existing CI/CD pipelines:

GitHub Actions: Pre-built actions for deployment automation

API Integration: REST API for programmatic deployments
python
1# GitHub Actions workflow
2name: Deploy Model
3on:
4 push:
5 branches: [main]
6
7jobs:
8 deploy:
9 runs-on: ubuntu-latest
10 steps:
11 - uses: actions/checkout@v3
12 - name: Deploy to LangTrain
13 uses: langtrain/deploy-action@v1
14 with:
15 api-key: ${{ secrets.LANGTRAIN_API_KEY }}
16 model-id: ${{ vars.MODEL_ID }}
17 deployment-name: production-api

On this page

Quick DeployContainer DeploymentLoad BalancingMonitoring & AlertsCI/CD Integration