L
000

Initializing Studio...

LangtrainLangtrain
DocsAPI ReferenceSDK Reference
ModelsChat
  • Cloud Deployment
  • Security
GitHubDiscord

Cloud Deployment

Deploy your fine-tuned models to production with OpenAI-compatible endpoints.

One-click Deploy
Docker
Kubernetes
AWS/GCP

Push to Langtrain Cloud

Deploy your trained model with a single command. Your model becomes available via an OpenAI-compatible API endpoint.
1from langtrain import LoRATrainer
2
3# After training, push to cloud
4trainer = LoRATrainer(model="llama-3.3-8b", output_dir="./model")
5trainer.train("data.jsonl")
6
7# Deploy to Langtrain Cloud
8trainer.push("my-assistant")
9
10# Your model is now available at:
11# POST https://api.langtrain.xyz/v1/chat/completions
12# Model: my-assistant

API Endpoint

Access your deployed model using the OpenAI-compatible API. Works with any OpenAI SDK.
1import openai
2
3# Use with OpenAI SDK
4client = openai.OpenAI(
5 api_key="your-langtrain-api-key",
6 base_url="https://api.langtrain.xyz/v1"
7)
8
9response = client.chat.completions.create(
10 model="my-assistant",
11 messages=[{"role": "user", "content": "Hello!"}]
12)
13
14print(response.choices[0].message.content)
15
16# Streaming
17stream = client.chat.completions.create(
18 model="my-assistant",
19 messages=[{"role": "user", "content": "Tell me a story"}],
20 stream=True
21)
22for chunk in stream:
23 print(chunk.choices[0].delta.content or "", end="")

Docker Export

Export your model as a Docker container for self-hosting.
1# Export model to Docker
2langtrain export docker --model my-assistant --output ./docker
3
4# Build and run locally
5cd docker
6docker build -t my-assistant:latest .
7docker run -p 8000:8000 --gpus all my-assistant:latest
8
9# Test the endpoint
10curl -X POST http://localhost:8000/v1/chat/completions \
11 -H "Content-Type: application/json" \
12 -d '{"model": "my-assistant", "messages": [{"role": "user", "content": "Hello"}]}'

Kubernetes Deployment

Deploy at scale with Kubernetes using our Helm chart.
1# Install Langtrain Helm chart
2helm repo add langtrain https://charts.langtrain.xyz
3helm repo update
4
5# Deploy with GPU support
6helm install my-assistant langtrain/model \
7 --set model.name=my-assistant \
8 --set model.apiKey=$LANGTRAIN_API_KEY \
9 --set resources.limits.nvidia.com/gpu=1 \
10 --set replicas=3
11
12# Expose via ingress
13kubectl apply -f - <<EOF
14apiVersion: networking.k8s.io/v1
15kind: Ingress
16metadata:
17 name: my-assistant
18spec:
19 rules:
20 - host: api.mycompany.com
21 http:
22 paths:
23 - path: /
24 backend:
25 service:
26 name: my-assistant
27 port: 8000
28EOF

Auto-scaling

Configure auto-scaling for production workloads. Scale based on request volume or GPU utilization.
1# Langtrain Cloud auto-scaling (via dashboard or API)
2deployment_config = {
3 "model": "my-assistant",
4 "scaling": {
5 "min_replicas": 1,
6 "max_replicas": 10,
7 "target_gpu_utilization": 70,
8 "scale_up_cooldown": "2m",
9 "scale_down_cooldown": "10m"
10 },
11 "instance_type": "gpu-a100-40gb"
12}
13
14# Kubernetes HPA
15kubectl autoscale deployment my-assistant \
16 --min=1 --max=10 \
17 --cpu-percent=70

Export Formats

Export models in various formats for different use cases.
1# Export to HuggingFace format
2trainer.export("./export", format="huggingface")
3
4# Export to GGUF for llama.cpp
5trainer.export("./export", format="gguf", quantization="q4_k_m")
6
7# Export to ONNX for edge deployment
8trainer.export("./export", format="onnx")
9
10# Upload to HuggingFace Hub
11trainer.push_to_hub("your-username/my-model")
Previous
Python SDK
Next
Security