AWS AI Services
Amazon SageMaker: Building Custom ML Models for Australian Businesses
Comprehensive guide to using Amazon SageMaker for building, training, and deploying custom machine learning models tailored to your business needs.
CloudPoint Team
While pre-built AI services work for many use cases, some Australian businesses need custom machine learning models trained on their specific data. Amazon SageMaker provides a complete platform for building, training, and deploying custom ML models without managing infrastructure.
When to Use SageMaker
Use SageMaker when:
- Pre-built AI services don’t fit your use case
- You have domain-specific data
- You need custom model logic
- You have ML expertise in-house
- Unique competitive advantage from custom models
Use pre-built AI services when:
- Standard use cases (sentiment, transcription, etc.)
- Limited ML expertise
- Quick time-to-market
- Cost-sensitive
SageMaker Components
SageMaker Studio
Integrated development environment for ML:
- Jupyter notebooks
- Experiment tracking
- Model registry
- Pipeline management
- Team collaboration
SageMaker Training
Managed infrastructure for model training:
- Automatic scaling
- Distributed training
- Spot instances support
- Built-in algorithms
- Custom algorithms
SageMaker Inference
Deploy models for predictions:
- Real-time endpoints
- Batch transform
- Serverless inference
- Multi-model endpoints
- Auto-scaling
SageMaker Pipelines
Orchestrate ML workflows:
- Data processing
- Training
- Evaluation
- Deployment
- CI/CD for ML
Building a Custom Model
Example: Customer Churn Prediction
Australian telco wants to predict which customers will churn.
Step 1: Prepare Data
import pandas as pd
import boto3
import sagemaker
from sagemaker import get_execution_role
# Load data
df = pd.read_csv('customer_data.csv')
# Features
features = [
'tenure_months',
'monthly_charges',
'total_charges',
'contract_type',
'payment_method',
'internet_service',
'support_calls_last_month'
]
target = 'churned'
# Prepare for SageMaker (target first)
train_data = df[[target] + features]
# Upload to S3
session = sagemaker.Session()
bucket = session.default_bucket()
train_path = session.upload_data(
'train.csv',
bucket=bucket,
key_prefix='churn-model/data'
)
Step 2: Train Model
Using SageMaker’s built-in XGBoost:
from sagemaker.estimator import Estimator
role = get_execution_role()
# Configure XGBoost
xgboost = Estimator(
image_uri=sagemaker.image_uris.retrieve('xgboost', 'ap-southeast-2', '1.5-1'),
role=role,
instance_count=1,
instance_type='ml.m5.xlarge',
output_path=f's3://{bucket}/churn-model/output',
sagemaker_session=session,
use_spot_instances=True, # Save up to 90%
max_run=3600,
max_wait=7200
)
# Set hyperparameters
xgboost.set_hyperparameters(
objective='binary:logistic',
num_round=100,
max_depth=5,
eta=0.2,
subsample=0.8,
colsample_bytree=0.8
)
# Train model
xgboost.fit({'train': train_path})
Step 3: Deploy Model
# Deploy to real-time endpoint
predictor = xgboost.deploy(
initial_instance_count=1,
instance_type='ml.t2.medium', # Small instance for low traffic
endpoint_name='churn-prediction-endpoint'
)
# Make predictions
test_data = [[
12, # tenure_months
79.99, # monthly_charges
960.00, # total_charges
1, # contract_type (encoded)
0, # payment_method (encoded)
1, # internet_service (encoded)
3 # support_calls_last_month
]]
prediction = predictor.predict(test_data)
print(f"Churn probability: {prediction[0]:.2%}")
Step 4: Monitor and Update
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor import CronExpressionGenerator
# Create model monitor
monitor = DefaultModelMonitor(
role=role,
instance_count=1,
instance_type='ml.m5.large',
max_runtime_in_seconds=1800
)
# Create baseline from training data
monitor.suggest_baseline(
baseline_dataset=train_path,
dataset_format={'csv': {'header': True}},
output_s3_uri=f's3://{bucket}/churn-model/baseline'
)
# Schedule monitoring (hourly)
monitor.create_monitoring_schedule(
endpoint_input=predictor.endpoint_name,
output_s3_uri=f's3://{bucket}/churn-model/monitoring',
statistics=f's3://{bucket}/churn-model/baseline/statistics.json',
constraints=f's3://{bucket}/churn-model/baseline/constraints.json',
schedule_cron_expression=CronExpressionGenerator.hourly()
)
Cost Optimisation
Training Costs
Use Spot Instances:
estimator = Estimator(
...
use_spot_instances=True,
max_wait=7200 # Willing to wait for spot
)
Savings: Up to 90% vs on-demand
Managed Spot Training:
- SageMaker handles interruptions
- Automatically resumes training
- Checkpoints progress
Inference Costs
Serverless Inference:
For infrequent or variable traffic:
from sagemaker.serverless import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=2048,
max_concurrency=10
)
predictor = model.deploy(
serverless_inference_config=serverless_config
)
Benefits:
- Pay per request
- No charge when idle
- Auto-scaling
- Good for < 10 requests/minute
Auto-scaling for Real-time Endpoints:
import boto3
client = boto3.client('application-autoscaling')
# Register scalable target
client.register_scalable_target(
ServiceNamespace='sagemaker',
ResourceId=f'endpoint/{endpoint_name}/variant/AllTraffic',
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
MinCapacity=1,
MaxCapacity=5
)
# Target tracking policy
client.put_scaling_policy(
PolicyName='target-tracking-scaling-policy',
ServiceNamespace='sagemaker',
ResourceId=f'endpoint/{endpoint_name}/variant/AllTraffic',
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
PolicyType='TargetTrackingScaling',
TargetTrackingScalingPolicyConfiguration={
'TargetValue': 70.0, # 70% invocations per instance
'PredefinedMetricSpecification': {
'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance'
}
}
)
Multi-Model Endpoints
Host multiple models on same endpoint:
from sagemaker.multidatamodel import MultiDataModel
mdm = MultiDataModel(
name='multi-model-endpoint',
model_data_prefix=f's3://{bucket}/models/',
image_uri=container_image,
role=role
)
predictor = mdm.deploy(
initial_instance_count=1,
instance_type='ml.m5.xlarge'
)
# Invoke specific model
prediction = predictor.predict(
data=test_data,
target_model='model-v2.tar.gz'
)
Savings: Share infrastructure across models
MLOps with SageMaker Pipelines
Automate ML workflows:
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep, CreateModelStep
from sagemaker.workflow.step_collections import RegisterModel
# Data processing step
processing_step = ProcessingStep(
name='PrepareData',
processor=processor,
inputs=[...],
outputs=[...],
code='preprocess.py'
)
# Training step
training_step = TrainingStep(
name='TrainModel',
estimator=xgboost,
inputs={
'train': TrainingInput(
s3_data=processing_step.properties.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri
)
}
)
# Model registration
register_step = RegisterModel(
name='RegisterModel',
estimator=xgboost,
model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
content_types=['text/csv'],
response_types=['text/csv'],
inference_instances=['ml.t2.medium', 'ml.m5.large'],
transform_instances=['ml.m5.large'],
model_package_group_name='ChurnPredictionModels',
approval_status='PendingManualApproval'
)
# Create pipeline
pipeline = Pipeline(
name='ChurnPredictionPipeline',
parameters=[...],
steps=[processing_step, training_step, register_step]
)
# Execute
pipeline.upsert(role_arn=role)
execution = pipeline.start()
SageMaker Ground Truth
Label training data efficiently:
Built-in Labeling Workflows:
- Image classification
- Object detection
- Text classification
- Named entity recognition
Custom Workflows:
- Define custom UI
- Use your labeling instructions
Active Learning:
- Auto-labeling with high-confidence predictions
- Humans label uncertain examples
- Reduces labeling costs by 40-70%
SageMaker Feature Store
Centralised repository for ML features:
from sagemaker.feature_store.feature_group import FeatureGroup
# Define feature group
feature_group = FeatureGroup(
name='customer-features',
sagemaker_session=session
)
# Load feature definitions
feature_group.load_feature_definitions(
data_frame=features_df
)
# Create in Feature Store
feature_group.create(
s3_uri=f's3://{bucket}/feature-store',
record_identifier_name='customer_id',
event_time_feature_name='event_time',
role_arn=role,
enable_online_store=True
)
# Ingest features
feature_group.ingest(
data_frame=features_df,
max_workers=3,
wait=True
)
# Retrieve features for training
training_data = feature_group.athena_query().run(
query_string="""
SELECT *
FROM "customer-features"
WHERE event_time >= '2025-01-01'
""",
output_location=f's3://{bucket}/query-results/'
)
Common Use Cases
Demand Forecasting
Predict inventory needs:
- Time-series data
- Seasonality
- External factors (promotions, weather)
- DeepAR algorithm
Fraud Detection
Real-time fraud detection:
- Transaction patterns
- Anomaly detection
- Real-time inference
- Low latency requirements
Personalization
Product recommendations:
- Collaborative filtering
- Content-based filtering
- Hybrid approaches
- Batch or real-time
Predictive Maintenance
Equipment failure prediction:
- Sensor data
- Time-series
- Classification or regression
- IoT integration
Best Practices
Experiment Tracking
from sagemaker.experiments import Run
with Run(
experiment_name='churn-prediction-experiment',
run_name='xgboost-v1',
sagemaker_session=session
) as run:
# Log parameters
run.log_parameters({
'max_depth': 5,
'eta': 0.2,
'num_round': 100
})
# Train model
xgboost.fit({'train': train_path})
# Log metrics
run.log_metric(name='validation:auc', value=0.87)
Model Registry
Version control for models:
from sagemaker.model import Model
model = Model(
image_uri=container_image,
model_data=model_artifact_s3_uri,
role=role
)
model_package = model.register(
content_types=['text/csv'],
response_types=['text/csv'],
inference_instances=['ml.t2.medium'],
transform_instances=['ml.m5.large'],
model_package_group_name='ChurnPredictionModels',
model_metrics={
'accuracy': {'value': 0.92},
'auc': {'value': 0.87}
},
approval_status='Approved'
)
A/B Testing
Test model versions:
from sagemaker.model_monitor import ModelQualityMonitor
# Deploy two variants
predictor = model_v1.deploy(
initial_instance_count=1,
instance_type='ml.m5.large',
endpoint_name='ab-test-endpoint',
variant_name='variant-v1',
traffic_weight=50
)
model_v2.deploy(
initial_instance_count=1,
instance_type='ml.m5.large',
endpoint_name='ab-test-endpoint',
variant_name='variant-v2',
traffic_weight=50
)
# Monitor performance
# Gradually shift traffic to better performer
Security and Compliance
VPC Configuration
Run in your VPC:
from sagemaker.network import NetworkConfig
network_config = NetworkConfig(
enable_network_isolation=True,
security_group_ids=['sg-12345678'],
subnets=['subnet-12345678', 'subnet-87654321']
)
estimator = Estimator(
...
network_config=network_config
)
Encryption
At Rest: KMS encryption for S3, EBS In Transit: TLS for all API calls
IAM Permissions
Least privilege:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"sagemaker:CreateTrainingJob",
"sagemaker:CreateModel",
"sagemaker:CreateEndpoint"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "ap-southeast-2"
}
}
}]
}
Conclusion
Amazon SageMaker provides a complete platform for custom machine learning. For Australian businesses with unique requirements or competitive advantages from custom models, SageMaker offers the tools to build, train, and deploy ML at scale.
While more complex than pre-built AI services, SageMaker’s flexibility and power enable sophisticated ML use cases. Combined with cost optimisation strategies and MLOps best practices, it’s a robust platform for ML-driven innovation.
CloudPoint helps Australian businesses implement SageMaker - from use case definition and data strategy through model development, deployment, and ongoing optimisation. Contact us to discuss your custom ML requirements.
Need Help with SageMaker?
CloudPoint helps Australian businesses implement SageMaker for custom ML solutions. Get in touch to discuss your requirements.