L

Initializing Studio...

Langtrain Logo
LangtrainDocs

Getting Started

  • Introduction
  • Quick Start
  • Installation

Fine-tuning

  • LoRA & QLoRA
  • Full Fine-tuning

API & SDK

  • REST API
  • Python SDK

Deployment

  • Cloud Deployment
  • Security

Resources

  • FAQ
  • Changelog

Full Fine-tuning

Complete guide to full parameter fine-tuning for maximum model customization and performance.

Full Parameters
Multi-GPU
DeepSpeed
PyTorch

When to Use Full Fine-tuning

Full fine-tuning is recommended when:
Maximum Performance: You need the absolute best performance for your task
Domain Adaptation: Adapting to a very different domain from the base model
Task Specialization: Creating highly specialized models for specific tasks
1# Full fine-tuning configuration
2config = {
3 "method": "full",
4 "model": "llama-2-7b",
5 "learning_rate": 1e-5,
6 "batch_size": 8,
7 "gradient_accumulation_steps": 4,
8 "epochs": 3,
9 "warmup_steps": 500,
10 "weight_decay": 0.01,
11 "optimizer": "adamw",
12 "scheduler": "cosine"
13}

Dataset Preparation

Prepare your dataset for full fine-tuning:
Data Quality: High-quality, diverse training data is crucial
Format: Support for various formats including JSONL, CSV, and Parquet
1# Example training data format
2{
3 "instruction": "Summarize the following text:",
4 "input": "Large language models have shown remarkable capabilities...",
5 "output": "LLMs demonstrate strong performance across many NLP tasks."
6}
7
8# Upload dataset
9dataset = client.datasets.upload(
10 file_path="full_training_data.jsonl",
11 name="full-finetune-dataset",
12 validation_split=0.1
13)

Training Configuration

Configure your full fine-tuning job:
Hardware Requirements: Full fine-tuning requires significant GPU memory
Training Time: Longer training times compared to LoRA methods
1# Start full fine-tuning job
2job = client.fine_tune.create(
3 model="mistral-7b",
4 dataset=dataset.id,
5 config={
6 "method": "full",
7 "learning_rate": 5e-6,
8 "batch_size": 4,
9 "epochs": 2,
10 "gradient_checkpointing": True,
11 "fp16": True,
12 "deepspeed_stage": 2,
13 "save_steps": 500
14 }
15)
16
17print(f"Full fine-tuning job started: {job.id}")

Distributed Training

Scale your training across multiple GPUs:
Multi-GPU: Automatic data parallelism across available GPUs
DeepSpeed: Integration with DeepSpeed for memory-efficient training
1# Distributed training configuration
2distributed_config = {
3 "method": "full",
4 "distributed": {
5 "strategy": "deepspeed",
6 "stage": 3, # ZeRO stage 3 for maximum memory efficiency
7 "gradient_clipping": 1.0
8 },
9 "hardware": {
10 "gpu_count": 8,
11 "instance_type": "gpu-large"
12 }
13}
14
15# Launch distributed training
16job = client.fine_tune.create(
17 model="llama-2-13b",
18 dataset=dataset.id,
19 config=distributed_config
20)

Monitoring Training

Monitor your full fine-tuning progress:
Metrics: Track loss, learning rate, and validation metrics
Early Stopping: Automatic early stopping to prevent overfitting
1# Monitor training progress
2while job.status in ["queued", "running"]:
3 job = client.fine_tune.get(job.id)
4
5 if job.metrics:
6 print(f"Step: {job.metrics.step}")
7 print(f"Training Loss: {job.metrics.train_loss:.4f}")
8 print(f"Validation Loss: {job.metrics.eval_loss:.4f}")
9
10 time.sleep(60)
11
12print(f"Training completed with status: {job.status}")

Best Practices

Tips for successful full fine-tuning:
Learning Rate: Start with lower learning rates (1e-5 to 5e-6)
Regularization: Use weight decay and dropout to prevent overfitting
Validation: Always use a validation set to monitor generalization
1# Best practices configuration
2best_practices_config = {
3 "method": "full",
4 "learning_rate": 2e-6, # Conservative learning rate
5 "weight_decay": 0.01, # L2 regularization
6 "dropout": 0.1,
7 "gradient_clipping": 1.0,
8 "early_stopping": {
9 "patience": 3,
10 "metric": "eval_loss",
11 "min_delta": 0.001
12 },
13 "load_best_model_at_end": True
14}