Amazon SageMaker: Building Custom ML Models for Australian Businesses

While pre-built AI services work for many use cases, some Australian businesses need custom machine learning models trained on their specific data. Amazon SageMaker provides a complete platform for building, training, and deploying custom ML models without managing infrastructure.

When to Use SageMaker

Use SageMaker when:

Pre-built AI services don’t fit your use case
You have domain-specific data
You need custom model logic
You have ML expertise in-house
Unique competitive advantage from custom models

Use pre-built AI services when:

Standard use cases (sentiment, transcription, etc.)
Limited ML expertise
Quick time-to-market
Cost-sensitive

SageMaker Components

SageMaker Studio

Integrated development environment for ML:

Jupyter notebooks
Experiment tracking
Model registry
Pipeline management
Team collaboration

SageMaker Training

Managed infrastructure for model training:

Automatic scaling
Distributed training
Spot instances support
Built-in algorithms
Custom algorithms

SageMaker Inference

Deploy models for predictions:

Real-time endpoints
Batch transform
Serverless inference
Multi-model endpoints
Auto-scaling

SageMaker Pipelines

Orchestrate ML workflows:

Data processing
Training
Evaluation
Deployment
CI/CD for ML

Building a Custom Model

Example: Customer Churn Prediction

Australian telco wants to predict which customers will churn.

Step 1: Prepare Data

import pandas as pd
import boto3
import sagemaker
from sagemaker import get_execution_role

# Load data
df = pd.read_csv('customer_data.csv')

# Features
features = [
    'tenure_months',
    'monthly_charges',
    'total_charges',
    'contract_type',
    'payment_method',
    'internet_service',
    'support_calls_last_month'
]

target = 'churned'

# Prepare for SageMaker (target first)
train_data = df[[target] + features]

# Upload to S3
session = sagemaker.Session()
bucket = session.default_bucket()

train_path = session.upload_data(
    'train.csv',
    bucket=bucket,
    key_prefix='churn-model/data'
)

Step 2: Train Model

Using SageMaker’s built-in XGBoost:

from sagemaker.estimator import Estimator

role = get_execution_role()

# Configure XGBoost
xgboost = Estimator(
    image_uri=sagemaker.image_uris.retrieve('xgboost', 'ap-southeast-2', '1.5-1'),
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=f's3://{bucket}/churn-model/output',
    sagemaker_session=session,
    use_spot_instances=True,  # Save up to 90%
    max_run=3600,
    max_wait=7200
)

# Set hyperparameters
xgboost.set_hyperparameters(
    objective='binary:logistic',
    num_round=100,
    max_depth=5,
    eta=0.2,
    subsample=0.8,
    colsample_bytree=0.8
)

# Train model
xgboost.fit({'train': train_path})

Step 3: Deploy Model

# Deploy to real-time endpoint
predictor = xgboost.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium',  # Small instance for low traffic
    endpoint_name='churn-prediction-endpoint'
)

# Make predictions
test_data = [[
    12,      # tenure_months
    79.99,   # monthly_charges
    960.00,  # total_charges
    1,       # contract_type (encoded)
    0,       # payment_method (encoded)
    1,       # internet_service (encoded)
    3        # support_calls_last_month
]]

prediction = predictor.predict(test_data)
print(f"Churn probability: {prediction[0]:.2%}")

Step 4: Monitor and Update

from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor import CronExpressionGenerator

# Create model monitor
monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    max_runtime_in_seconds=1800
)

# Create baseline from training data
monitor.suggest_baseline(
    baseline_dataset=train_path,
    dataset_format={'csv': {'header': True}},
    output_s3_uri=f's3://{bucket}/churn-model/baseline'
)

# Schedule monitoring (hourly)
monitor.create_monitoring_schedule(
    endpoint_input=predictor.endpoint_name,
    output_s3_uri=f's3://{bucket}/churn-model/monitoring',
    statistics=f's3://{bucket}/churn-model/baseline/statistics.json',
    constraints=f's3://{bucket}/churn-model/baseline/constraints.json',
    schedule_cron_expression=CronExpressionGenerator.hourly()
)

Cost Optimisation

Training Costs

Use Spot Instances:

estimator = Estimator(
    ...
    use_spot_instances=True,
    max_wait=7200  # Willing to wait for spot
)

Savings: Up to 90% vs on-demand

Managed Spot Training:

SageMaker handles interruptions
Automatically resumes training
Checkpoints progress

Inference Costs

Serverless Inference:

For infrequent or variable traffic:

from sagemaker.serverless import ServerlessInferenceConfig

serverless_config = ServerlessInferenceConfig(
    memory_size_in_mb=2048,
    max_concurrency=10
)

predictor = model.deploy(
    serverless_inference_config=serverless_config
)

Benefits:

Pay per request
No charge when idle
Auto-scaling
Good for < 10 requests/minute

Auto-scaling for Real-time Endpoints:

import boto3

client = boto3.client('application-autoscaling')

# Register scalable target
client.register_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId=f'endpoint/{endpoint_name}/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=1,
    MaxCapacity=5
)

# Target tracking policy
client.put_scaling_policy(
    PolicyName='target-tracking-scaling-policy',
    ServiceNamespace='sagemaker',
    ResourceId=f'endpoint/{endpoint_name}/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    PolicyType='TargetTrackingScaling',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 70.0,  # 70% invocations per instance
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance'
        }
    }
)

Multi-Model Endpoints

Host multiple models on same endpoint:

from sagemaker.multidatamodel import MultiDataModel

mdm = MultiDataModel(
    name='multi-model-endpoint',
    model_data_prefix=f's3://{bucket}/models/',
    image_uri=container_image,
    role=role
)

predictor = mdm.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.xlarge'
)

# Invoke specific model
prediction = predictor.predict(
    data=test_data,
    target_model='model-v2.tar.gz'
)

Savings: Share infrastructure across models

MLOps with SageMaker Pipelines

Automate ML workflows:

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep, CreateModelStep
from sagemaker.workflow.step_collections import RegisterModel

# Data processing step
processing_step = ProcessingStep(
    name='PrepareData',
    processor=processor,
    inputs=[...],
    outputs=[...],
    code='preprocess.py'
)

# Training step
training_step = TrainingStep(
    name='TrainModel',
    estimator=xgboost,
    inputs={
        'train': TrainingInput(
            s3_data=processing_step.properties.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri
        )
    }
)

# Model registration
register_step = RegisterModel(
    name='RegisterModel',
    estimator=xgboost,
    model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=['text/csv'],
    response_types=['text/csv'],
    inference_instances=['ml.t2.medium', 'ml.m5.large'],
    transform_instances=['ml.m5.large'],
    model_package_group_name='ChurnPredictionModels',
    approval_status='PendingManualApproval'
)

# Create pipeline
pipeline = Pipeline(
    name='ChurnPredictionPipeline',
    parameters=[...],
    steps=[processing_step, training_step, register_step]
)

# Execute
pipeline.upsert(role_arn=role)
execution = pipeline.start()

SageMaker Ground Truth

Label training data efficiently:

Built-in Labeling Workflows:

Image classification
Object detection
Text classification
Named entity recognition

Custom Workflows:

Define custom UI
Use your labeling instructions

Active Learning:

Auto-labeling with high-confidence predictions
Humans label uncertain examples
Reduces labeling costs by 40-70%

SageMaker Feature Store

Centralised repository for ML features:

from sagemaker.feature_store.feature_group import FeatureGroup

# Define feature group
feature_group = FeatureGroup(
    name='customer-features',
    sagemaker_session=session
)

# Load feature definitions
feature_group.load_feature_definitions(
    data_frame=features_df
)

# Create in Feature Store
feature_group.create(
    s3_uri=f's3://{bucket}/feature-store',
    record_identifier_name='customer_id',
    event_time_feature_name='event_time',
    role_arn=role,
    enable_online_store=True
)

# Ingest features
feature_group.ingest(
    data_frame=features_df,
    max_workers=3,
    wait=True
)

# Retrieve features for training
training_data = feature_group.athena_query().run(
    query_string="""
        SELECT *
        FROM "customer-features"
        WHERE event_time >= '2025-01-01'
    """,
    output_location=f's3://{bucket}/query-results/'
)

Common Use Cases

Demand Forecasting

Predict inventory needs:

Time-series data
Seasonality
External factors (promotions, weather)
DeepAR algorithm

Fraud Detection

Real-time fraud detection:

Transaction patterns
Anomaly detection
Real-time inference
Low latency requirements

Personalization

Product recommendations:

Collaborative filtering
Content-based filtering
Hybrid approaches
Batch or real-time

Predictive Maintenance

Equipment failure prediction:

Sensor data
Time-series
Classification or regression
IoT integration

Best Practices

Experiment Tracking

from sagemaker.experiments import Run

with Run(
    experiment_name='churn-prediction-experiment',
    run_name='xgboost-v1',
    sagemaker_session=session
) as run:
    # Log parameters
    run.log_parameters({
        'max_depth': 5,
        'eta': 0.2,
        'num_round': 100
    })

    # Train model
    xgboost.fit({'train': train_path})

    # Log metrics
    run.log_metric(name='validation:auc', value=0.87)

Model Registry

Version control for models:

from sagemaker.model import Model

model = Model(
    image_uri=container_image,
    model_data=model_artifact_s3_uri,
    role=role
)

model_package = model.register(
    content_types=['text/csv'],
    response_types=['text/csv'],
    inference_instances=['ml.t2.medium'],
    transform_instances=['ml.m5.large'],
    model_package_group_name='ChurnPredictionModels',
    model_metrics={
        'accuracy': {'value': 0.92},
        'auc': {'value': 0.87}
    },
    approval_status='Approved'
)

A/B Testing

Test model versions:

from sagemaker.model_monitor import ModelQualityMonitor

# Deploy two variants
predictor = model_v1.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name='ab-test-endpoint',
    variant_name='variant-v1',
    traffic_weight=50
)

model_v2.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name='ab-test-endpoint',
    variant_name='variant-v2',
    traffic_weight=50
)

# Monitor performance
# Gradually shift traffic to better performer

Security and Compliance

VPC Configuration

Run in your VPC:

from sagemaker.network import NetworkConfig

network_config = NetworkConfig(
    enable_network_isolation=True,
    security_group_ids=['sg-12345678'],
    subnets=['subnet-12345678', 'subnet-87654321']
)

estimator = Estimator(
    ...
    network_config=network_config
)

Encryption

At Rest: KMS encryption for S3, EBS In Transit: TLS for all API calls

IAM Permissions

Least privilege:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "sagemaker:CreateTrainingJob",
            "sagemaker:CreateModel",
            "sagemaker:CreateEndpoint"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "aws:RequestedRegion": "ap-southeast-2"
            }
        }
    }]
}

Conclusion

Amazon SageMaker provides a complete platform for custom machine learning. For Australian businesses with unique requirements or competitive advantages from custom models, SageMaker offers the tools to build, train, and deploy ML at scale.

While more complex than pre-built AI services, SageMaker’s flexibility and power enable sophisticated ML use cases. Combined with cost optimisation strategies and MLOps best practices, it’s a robust platform for ML-driven innovation.

CloudPoint helps Australian businesses implement SageMaker - from use case definition and data strategy through model development, deployment, and ongoing optimisation. Contact us to discuss your custom ML requirements.

Need Help with SageMaker?

CloudPoint helps Australian businesses implement SageMaker for custom ML solutions. Get in touch to discuss your requirements.

Learn more about our AI Services →