Core Concepts

Understanding these fundamental concepts will help you make the most of LangTrain's platform for training and deploying AI models.

🔗

Unified Platform

All ML concepts integrated in one coherent platform

🎯

Auto-Optimization

Automatic hyperparameter tuning and model optimization

📈

Scalable Infrastructure

From prototype to production with the same tools

📊

Comprehensive Monitoring

Track everything from training to production performance

Training Modes

LangTrain offers multiple training approaches to fit different workflows and use cases.

🖥️ Interactive Mode:
Real-time training with a web-based dashboard. Perfect for experimentation and learning.

Key Features:
- ✅ Live metrics and visualizations
- ✅ Adjust parameters during training
- ✅ Early stopping controls
- ✅ Interactive debugging and monitoring
- ✅ Real-time loss and accuracy tracking
- ✅ Visual training progress indicators

⚙️ Batch Mode:
Automated training using configuration files. Ideal for production workflows.

Key Features:
- ✅ Reproducible training runs
- ✅ CI/CD integration support
- ✅ Scheduled training jobs
- ✅ Advanced scaling options
- ✅ Configuration-driven workflows
- ✅ Automated model versioning

When to Use Each Mode:
- Interactive Mode: Experimentation, prototyping, learning, small datasets
- Batch Mode: Production workflows, large datasets, automated pipelines, scheduled retraining

Models

In LangTrain, a model is the core entity that learns patterns from your data and makes predictions.

Model Types:
- Text Classification: Categorize text into predefined classes
- Language Generation: Generate human-like text responses
- Embeddings: Convert text into numerical vectors
- Named Entity Recognition: Identify entities in text
- Custom Models: Upload your own model architectures

Model Lifecycle:
1. Creation: Define model type and configuration
2. Training: Learn from your training data
3. Evaluation: Test performance on validation data
4. Deployment: Make available for inference
5. Monitoring: Track performance in production

Model Versions:
Each training run creates a new model version, allowing you to track improvements and roll back if needed.

Training

Training is the process where your model learns patterns from data to solve your specific problem.

Training Process:
1. Data Preparation: Clean and format your training data
2. Hyperparameter Selection: Choose learning rate, batch size, etc.
3. Model Training: The model learns from your data
4. Validation: Test performance on held-out data
5. Checkpointing: Save model state at regular intervals

Auto-Tuning:
LangTrain's auto-tuning feature automatically optimizes hyperparameters:
- Learning rate schedules
- Batch sizes and optimization algorithms
- Architecture parameters
- Regularization settings

Distributed Training:
For large datasets, LangTrain automatically distributes training across multiple GPUs and machines for faster results.

Datasets

Datasets contain the training examples your model learns from.

Data Formats:
- JSON: Structured data with labels
- CSV: Tabular data with headers
- Text Files: Raw text for language modeling
- Parquet: Efficient columnar format for large datasets

Dataset Types:
- Training Set: Data used to train the model
- Validation Set: Data used to evaluate during training
- Test Set: Final evaluation data (never seen during training)

Data Quality:
- Balanced Classes: Ensure equal representation
- Clean Labels: Accurate and consistent annotations
- Sufficient Volume: Enough examples for learning
- Diverse Examples: Cover edge cases and variations

Data Management:
LangTrain provides tools for versioning, previewing, and validating your datasets.

Evaluation

Evaluation measures how well your model performs on unseen data.

Metrics by Task:
- Classification: Accuracy, precision, recall, F1-score
- Generation: BLEU, ROUGE, perplexity
- Regression: MAE, MSE, R-squared
- Custom: Define your own evaluation metrics

Evaluation Types:
- Automatic Evaluation: Computed metrics on test data
- Human Evaluation: Manual review of model outputs
- A/B Testing: Compare models in production
- Cross-Validation: Multiple train/test splits

Performance Tracking:
- Real-time metrics during training
- Detailed evaluation reports
- Model comparison dashboards
- Performance history and trends

Fine-tuning

Fine-tuning adapts pre-trained models to your specific use case.

Benefits:
- Faster Training: Start with pre-learned features
- Better Performance: Leverage existing knowledge
- Less Data: Requires fewer training examples
- Cost Effective: Reduced computation requirements

Fine-tuning Strategies:
- Full Fine-tuning: Update all model parameters
- Partial Fine-tuning: Update only specific layers
- LoRA: Low-rank adaptation for efficient tuning
- Prompt Tuning: Learn optimal prompts

Best Practices:
- Use smaller learning rates than training from scratch
- Monitor for overfitting on small datasets
- Validate on data similar to your target domain
- Consider domain adaptation techniques

Deployment

Deployment makes your trained models available for real-world use.

Deployment Options:
- REST API: HTTP endpoints for web applications
- Batch Processing: Process large datasets offline
- Real-time Streaming: Handle live data streams
- Edge Deployment: Run models on device

Scaling:
- Auto-scaling: Automatically adjust capacity
- Load Balancing: Distribute requests efficiently
- Caching: Speed up repeated requests
- Regional Deployment: Reduce latency globally

Monitoring:
- Performance Metrics: Latency, throughput, errors
- Model Drift: Detect changes in data patterns
- Cost Tracking: Monitor usage and expenses
- Health Checks: Automated system monitoring

Full Examples

Create a Model

python

1import langtrain
2
3client = langtrain.LangTrain()
4
5# Create a text classification model
6model = client.models.create(
7    name="email-classifier",
8    type="text-classification",
9    description="Classify emails as spam or not spam",
10    labels=["spam", "not_spam"]
11)
12
13print(f"Created model: {model.id}")

Upload and Manage Dataset

python

1# Upload training data
2dataset = client.datasets.upload(
3    name="email-training-data",
4    file_path="./email_data.json",
5    format="json"
6)
7
8# Preview dataset
9preview = dataset.preview(n_samples=5)
10print("Sample data:")
11for sample in preview:
12    print(f"Text: {sample['text'][:50]}...")
13    print(f"Label: {sample['label']}")
14
15# Get dataset statistics
16stats = dataset.get_stats()
17print(f"Total samples: {stats['total_samples']}")
18print(f"Label distribution: {stats['label_distribution']}")

Train with Evaluation

python

1# Start training with automatic evaluation
2training_job = model.train(
3    dataset_id=dataset.id,
4    validation_split=0.2,
5    auto_tune=True,
6    evaluation_metrics=["accuracy", "f1", "precision", "recall"]
7)
8
9# Monitor training progress
10for update in training_job.stream_progress():
11    print(f"Step {update.step}: Loss={update.loss:.4f}, Accuracy={update.accuracy:.3f}")
12
13# Get final evaluation results
14results = training_job.get_results()
15print(f"Final accuracy: {results.metrics['accuracy']:.3f}")
16print(f"F1 score: {results.metrics['f1']:.3f}")

Fine-tune Pre-trained Model

python

1# Fine-tune from a pre-trained model
2model = client.models.create(
3    name="custom-sentiment-model",
4    type="text-classification",
5    base_model="bert-base-uncased",  # Start from pre-trained BERT
6    fine_tuning_config={
7        "learning_rate": 2e-5,
8        "num_epochs": 3,
9        "strategy": "full"  # or "lora" for efficient tuning
10    }
11)
12
13# Train on your specific data
14training_job = model.fine_tune(
15    dataset_id=your_dataset.id,
16    validation_split=0.15
17)
18
19print(f"Fine-tuning started: {training_job.id}")

Initializing Studio...

Core Concepts

Understanding these fundamental concepts will help you make the most of LangTrain's platform for training and deploying AI models.

🔗

Unified Platform

All ML concepts integrated in one coherent platform

🎯

Auto-Optimization

Automatic hyperparameter tuning and model optimization

📈

Scalable Infrastructure

From prototype to production with the same tools

📊

Comprehensive Monitoring

Track everything from training to production performance

Training Modes

Models

Training

Datasets

Evaluation

Fine-tuning

Deployment

Full Examples

Create a Model

python

1import langtrain
2
3client = langtrain.LangTrain()
4
5# Create a text classification model
6model = client.models.create(
7    name="email-classifier",
8    type="text-classification",
9    description="Classify emails as spam or not spam",
10    labels=["spam", "not_spam"]
11)
12
13print(f"Created model: {model.id}")

Upload and Manage Dataset

python

1# Upload training data
2dataset = client.datasets.upload(
3    name="email-training-data",
4    file_path="./email_data.json",
5    format="json"
6)
7
8# Preview dataset
9preview = dataset.preview(n_samples=5)
10print("Sample data:")
11for sample in preview:
12    print(f"Text: {sample['text'][:50]}...")
13    print(f"Label: {sample['label']}")
14
15# Get dataset statistics
16stats = dataset.get_stats()
17print(f"Total samples: {stats['total_samples']}")
18print(f"Label distribution: {stats['label_distribution']}")

Train with Evaluation

python

1# Start training with automatic evaluation
2training_job = model.train(
3    dataset_id=dataset.id,
4    validation_split=0.2,
5    auto_tune=True,
6    evaluation_metrics=["accuracy", "f1", "precision", "recall"]
7)
8
9# Monitor training progress
10for update in training_job.stream_progress():
11    print(f"Step {update.step}: Loss={update.loss:.4f}, Accuracy={update.accuracy:.3f}")
12
13# Get final evaluation results
14results = training_job.get_results()
15print(f"Final accuracy: {results.metrics['accuracy']:.3f}")
16print(f"F1 score: {results.metrics['f1']:.3f}")

Fine-tune Pre-trained Model

python

1# Fine-tune from a pre-trained model
2model = client.models.create(
3    name="custom-sentiment-model",
4    type="text-classification",
5    base_model="bert-base-uncased",  # Start from pre-trained BERT
6    fine_tuning_config={
7        "learning_rate": 2e-5,
8        "num_epochs": 3,
9        "strategy": "full"  # or "lora" for efficient tuning
10    }
11)
12
13# Train on your specific data
14training_job = model.fine_tune(
15    dataset_id=your_dataset.id,
16    validation_split=0.15
17)
18
19print(f"Fine-tuning started: {training_job.id}")