L

Initializing Studio...

Documentation

Getting Started

  • Introduction
  • Quick Start
  • Installation

Fine-tuning

  • LoRA & QLoRA
  • Full Fine-tuning

API & SDK

  • REST API
  • Python SDK

Deployment

  • Cloud Deployment
  • Security

Resources

  • FAQ
  • Changelog

Monitoring

Real-time monitoring and observability for your models and training processes.

Real-time Monitoring

Track training progress, model performance, and system metrics with LangTrain's comprehensive monitoring dashboard. Get insights into your model's behavior in real-time.

Key Monitoring Features:
•Training progress and loss curves

•Model performance metrics

•Resource utilization (GPU, CPU, Memory)

•Data quality monitoring

•Error tracking and alerting

•Custom dashboards and visualizations
python
1from langtrain import Monitor
2
3# Initialize monitoring
4monitor = Monitor(
5 project_name='my_project',
6 experiment_name='bert_fine_tuning',
7 tracking_uri='http://localhost:5000'
8)
9
10# Start monitoring training
11monitor.start_training(
12 model=model,
13 train_data=train_dataset,
14 val_data=val_dataset,
15 metrics=['loss', 'accuracy', 'f1_score'],
16 log_frequency=100 # Log every 100 steps
17)
18
19# Log custom metrics
20monitor.log_metric('learning_rate', 0.001, step=epoch)
21monitor.log_metric('batch_size', 32)
22monitor.log_artifact('model_config.json', config)

Performance Monitoring

Monitor model performance across different dimensions and detect degradation early.
python
1# Performance monitoring setup
2from langtrain.monitoring import PerformanceMonitor
3
4perf_monitor = PerformanceMonitor(
5 model=model,
6 baseline_metrics={
7 'accuracy': 0.92,
8 'latency_p95': 100, # milliseconds
9 'throughput': 1000 # requests/second
10 }
11)
12
13# Monitor inference performance
14@perf_monitor.track_inference
15def predict(inputs):
16 return model.predict(inputs)
17
18# Set up alerts for performance degradation
19perf_monitor.set_alert(
20 metric='accuracy',
21 threshold=0.85,
22 comparison='less_than',
23 action='email_alert'
24)
25
26# Generate performance reports
27report = perf_monitor.generate_report(
28 time_range='last_7_days',
29 include_trends=True
30)

System Resource Monitoring

Monitor system resources to optimize performance and prevent bottlenecks.
python
1# System resource monitoring
2from langtrain.monitoring import SystemMonitor
3
4sys_monitor = SystemMonitor(
5 track_gpu=True,
6 track_memory=True,
7 track_disk=True,
8 track_network=True
9)
10
11# Start system monitoring
12sys_monitor.start()
13
14# Get current resource usage
15resources = sys_monitor.get_current_usage()
16print(f"GPU Utilization: {resources['gpu_utilization']}%")
17print(f"Memory Usage: {resources['memory_usage']}%")
18print(f"Disk I/O: {resources['disk_io']} MB/s")
19
20# Set resource alerts
21sys_monitor.set_alert(
22 metric='gpu_memory',
23 threshold=90, # Alert at 90% GPU memory usage
24 action='scale_resources'
25)
26
27# Log resource metrics
28sys_monitor.log_to_dashboard(dashboard_url='http://grafana:3000')

Data Quality Monitoring

Monitor data quality and detect drift in production data streams.
python
1# Data quality monitoring
2from langtrain.monitoring import DataMonitor
3
4data_monitor = DataMonitor(
5 reference_data=training_data,
6 feature_columns=['text_length', 'sentiment_score'],
7 categorical_columns=['category', 'language']
8)
9
10# Monitor incoming data
11@data_monitor.track_data_quality
12def process_batch(batch_data):
13 # Your data processing logic
14 predictions = model.predict(batch_data)
15 return predictions
16
17# Detect data drift
18drift_report = data_monitor.detect_drift(
19 new_data=production_data,
20 drift_methods=['ks_test', 'chi_square', 'jensen_shannon']
21)
22
23if drift_report.has_drift:
24 print(f"Data drift detected in features: {drift_report.drifted_features}")
25
26# Set up data quality alerts
27data_monitor.configure_alerts(
28 drift_threshold=0.1,
29 quality_threshold=0.95,
30 notification_channels=['email', 'slack']
31)

Custom Dashboards

Create custom dashboards to visualize metrics that matter most to your use case.
python
1# Custom dashboard creation
2from langtrain.monitoring import Dashboard
3
4dashboard = Dashboard(name='Model Performance Dashboard')
5
6# Add metric widgets
7dashboard.add_widget(
8 type='line_chart',
9 title='Training Loss',
10 metrics=['train_loss', 'val_loss'],
11 time_range='last_24_hours'
12)
13
14dashboard.add_widget(
15 type='gauge',
16 title='Current Accuracy',
17 metric='accuracy',
18 min_value=0.0,
19 max_value=1.0,
20 threshold_ranges=[
21 {'min': 0.0, 'max': 0.7, 'color': 'red'},
22 {'min': 0.7, 'max': 0.85, 'color': 'yellow'},
23 {'min': 0.85, 'max': 1.0, 'color': 'green'}
24 ]
25)
26
27dashboard.add_widget(
28 type='table',
29 title='Model Comparison',
30 data_source='model_comparison_results',
31 columns=['model_name', 'accuracy', 'f1_score', 'latency']
32)
33
34# Deploy dashboard
35dashboard.deploy(url='http://monitoring:8080/dashboard')

On this page

Real-time MonitoringPerformance MonitoringSystem Resource MonitoringData Quality MonitoringCustom Dashboards