L

Initializing Studio...

Docs

Getting Started

  • Introduction
  • Quick Start
  • Installation

Fine-tuning

  • LoRA & QLoRA
  • Full Fine-tuning

API & SDK

  • REST API
  • Python SDK

Deployment

  • Cloud Deployment
  • Security

Resources

  • FAQ
  • Changelog
Docs

Getting Started

  • Introduction
  • Quick Start
  • Installation

Fine-tuning

  • LoRA & QLoRA
  • Full Fine-tuning

API & SDK

  • REST API
  • Python SDK

Deployment

  • Cloud Deployment
  • Security

Resources

  • FAQ
  • Changelog

Full Fine-tuning

Complete guide to full parameter fine-tuning for maximum model customization and performance.

When to Use Full Fine-tuning

Full fine-tuning is recommended when:

Maximum Performance: You need the absolute best performance for your task

Domain Adaptation: Adapting to a very different domain from the base model

Task Specialization: Creating highly specialized models for specific tasks
python
1# Full fine-tuning configuration
2config = {
3 "method": "full",
4 "model": "llama-2-7b",
5 "learning_rate": 1e-5,
6 "batch_size": 8,
7 "gradient_accumulation_steps": 4,
8 "epochs": 3,
9 "warmup_steps": 500,
10 "weight_decay": 0.01,
11 "optimizer": "adamw",
12 "scheduler": "cosine"
13}

Dataset Preparation

Prepare your dataset for full fine-tuning:

Data Quality: High-quality, diverse training data is crucial

Format: Support for various formats including JSONL, CSV, and Parquet
python
1# Example training data format
2{
3 "instruction": "Summarize the following text:",
4 "input": "Large language models have shown remarkable capabilities...",
5 "output": "LLMs demonstrate strong performance across many NLP tasks."
6}
7
8# Upload dataset
9dataset = client.datasets.upload(
10 file_path="full_training_data.jsonl",
11 name="full-finetune-dataset",
12 validation_split=0.1
13)

Training Configuration

Configure your full fine-tuning job:

Hardware Requirements: Full fine-tuning requires significant GPU memory

Training Time: Longer training times compared to LoRA methods
python
1# Start full fine-tuning job
2job = client.fine_tune.create(
3 model="mistral-7b",
4 dataset=dataset.id,
5 config={
6 "method": "full",
7 "learning_rate": 5e-6,
8 "batch_size": 4,
9 "epochs": 2,
10 "gradient_checkpointing": True,
11 "fp16": True,
12 "deepspeed_stage": 2,
13 "save_steps": 500,
14 "logging_steps": 100,
15 "evaluation_strategy": "steps",
16 "eval_steps": 500
17 }
18)
19
20print(f"Full fine-tuning job started: {job.id}")

Distributed Training

Scale your training across multiple GPUs:

Multi-GPU: Automatic data parallelism across available GPUs

DeepSpeed: Integration with DeepSpeed for memory-efficient training
python
1# Distributed training configuration
2distributed_config = {
3 "method": "full",
4 "distributed": {
5 "strategy": "deepspeed",
6 "stage": 3, # ZeRO stage 3 for maximum memory efficiency
7 "gradient_clipping": 1.0,
8 "allgather_bucket_size": 2e8,
9 "reduce_bucket_size": 2e8
10 },
11 "hardware": {
12 "gpu_count": 8,
13 "instance_type": "gpu-large",
14 "gradient_accumulation_steps": 16
15 }
16}
17
18# Launch distributed training
19job = client.fine_tune.create(
20 model="llama-2-13b",
21 dataset=dataset.id,
22 config=distributed_config
23)

Monitoring Training

Monitor your full fine-tuning progress:

Metrics: Track loss, learning rate, and validation metrics

Early Stopping: Automatic early stopping to prevent overfitting
python
1# Monitor training progress
2while job.status in ["queued", "running"]:
3 job = client.fine_tune.get(job.id)
4
5 if job.metrics:
6 print(f"Step: {job.metrics.step}")
7 print(f"Training Loss: {job.metrics.train_loss:.4f}")
8 print(f"Validation Loss: {job.metrics.eval_loss:.4f}")
9 print(f"Learning Rate: {job.metrics.learning_rate:.2e}")
10
11 time.sleep(60)
12
13print(f"Training completed with status: {job.status}")

Best Practices

Tips for successful full fine-tuning:

Learning Rate: Start with lower learning rates (1e-5 to 5e-6)

Regularization: Use weight decay and dropout to prevent overfitting

Validation: Always use a validation set to monitor generalization
python
1# Best practices configuration
2best_practices_config = {
3 "method": "full",
4 "learning_rate": 2e-6, # Conservative learning rate
5 "weight_decay": 0.01, # L2 regularization
6 "dropout": 0.1, # Dropout for regularization
7 "gradient_clipping": 1.0, # Prevent gradient explosion
8 "early_stopping": {
9 "patience": 3,
10 "metric": "eval_loss",
11 "min_delta": 0.001
12 },
13 "save_strategy": "epoch",
14 "load_best_model_at_end": True
15}

On this page

When to Use Full Fine-tuningDataset PreparationTraining ConfigurationDistributed TrainingMonitoring TrainingBest Practices