Skip to main content

AWS AI Services

Amazon SageMaker: Building Custom ML Models for Australian Businesses

Comprehensive guide to using Amazon SageMaker for building, training, and deploying custom machine learning models tailored to your business needs.

CloudPoint

CloudPoint Team

While pre-built AI services work for many use cases, some Australian businesses need custom machine learning models trained on their specific data. Amazon SageMaker provides a complete platform for building, training, and deploying custom ML models without managing infrastructure.

When to Use SageMaker

Use SageMaker when:

  • Pre-built AI services don’t fit your use case
  • You have domain-specific data
  • You need custom model logic
  • You have ML expertise in-house
  • Unique competitive advantage from custom models

Use pre-built AI services when:

  • Standard use cases (sentiment, transcription, etc.)
  • Limited ML expertise
  • Quick time-to-market
  • Cost-sensitive

SageMaker Components

SageMaker Studio

Integrated development environment for ML:

  • Jupyter notebooks
  • Experiment tracking
  • Model registry
  • Pipeline management
  • Team collaboration

SageMaker Training

Managed infrastructure for model training:

  • Automatic scaling
  • Distributed training
  • Spot instances support
  • Built-in algorithms
  • Custom algorithms

SageMaker Inference

Deploy models for predictions:

  • Real-time endpoints
  • Batch transform
  • Serverless inference
  • Multi-model endpoints
  • Auto-scaling

SageMaker Pipelines

Orchestrate ML workflows:

  • Data processing
  • Training
  • Evaluation
  • Deployment
  • CI/CD for ML

Building a Custom Model

Example: Customer Churn Prediction

Australian telco wants to predict which customers will churn.

Step 1: Prepare Data

import pandas as pd
import boto3
import sagemaker
from sagemaker import get_execution_role

# Load data
df = pd.read_csv('customer_data.csv')

# Features
features = [
    'tenure_months',
    'monthly_charges',
    'total_charges',
    'contract_type',
    'payment_method',
    'internet_service',
    'support_calls_last_month'
]

target = 'churned'

# Prepare for SageMaker (target first)
train_data = df[[target] + features]

# Upload to S3
session = sagemaker.Session()
bucket = session.default_bucket()

train_path = session.upload_data(
    'train.csv',
    bucket=bucket,
    key_prefix='churn-model/data'
)

Step 2: Train Model

Using SageMaker’s built-in XGBoost:

from sagemaker.estimator import Estimator

role = get_execution_role()

# Configure XGBoost
xgboost = Estimator(
    image_uri=sagemaker.image_uris.retrieve('xgboost', 'ap-southeast-2', '1.5-1'),
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=f's3://{bucket}/churn-model/output',
    sagemaker_session=session,
    use_spot_instances=True,  # Save up to 90%
    max_run=3600,
    max_wait=7200
)

# Set hyperparameters
xgboost.set_hyperparameters(
    objective='binary:logistic',
    num_round=100,
    max_depth=5,
    eta=0.2,
    subsample=0.8,
    colsample_bytree=0.8
)

# Train model
xgboost.fit({'train': train_path})

Step 3: Deploy Model

# Deploy to real-time endpoint
predictor = xgboost.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium',  # Small instance for low traffic
    endpoint_name='churn-prediction-endpoint'
)

# Make predictions
test_data = [[
    12,      # tenure_months
    79.99,   # monthly_charges
    960.00,  # total_charges
    1,       # contract_type (encoded)
    0,       # payment_method (encoded)
    1,       # internet_service (encoded)
    3        # support_calls_last_month
]]

prediction = predictor.predict(test_data)
print(f"Churn probability: {prediction[0]:.2%}")

Step 4: Monitor and Update

from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor import CronExpressionGenerator

# Create model monitor
monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    max_runtime_in_seconds=1800
)

# Create baseline from training data
monitor.suggest_baseline(
    baseline_dataset=train_path,
    dataset_format={'csv': {'header': True}},
    output_s3_uri=f's3://{bucket}/churn-model/baseline'
)

# Schedule monitoring (hourly)
monitor.create_monitoring_schedule(
    endpoint_input=predictor.endpoint_name,
    output_s3_uri=f's3://{bucket}/churn-model/monitoring',
    statistics=f's3://{bucket}/churn-model/baseline/statistics.json',
    constraints=f's3://{bucket}/churn-model/baseline/constraints.json',
    schedule_cron_expression=CronExpressionGenerator.hourly()
)

Cost Optimisation

Training Costs

Use Spot Instances:

estimator = Estimator(
    ...
    use_spot_instances=True,
    max_wait=7200  # Willing to wait for spot
)

Savings: Up to 90% vs on-demand

Managed Spot Training:

  • SageMaker handles interruptions
  • Automatically resumes training
  • Checkpoints progress

Inference Costs

Serverless Inference:

For infrequent or variable traffic:

from sagemaker.serverless import ServerlessInferenceConfig

serverless_config = ServerlessInferenceConfig(
    memory_size_in_mb=2048,
    max_concurrency=10
)

predictor = model.deploy(
    serverless_inference_config=serverless_config
)

Benefits:

  • Pay per request
  • No charge when idle
  • Auto-scaling
  • Good for < 10 requests/minute

Auto-scaling for Real-time Endpoints:

import boto3

client = boto3.client('application-autoscaling')

# Register scalable target
client.register_scalable_target(
    ServiceNamespace='sagemaker',
    ResourceId=f'endpoint/{endpoint_name}/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=1,
    MaxCapacity=5
)

# Target tracking policy
client.put_scaling_policy(
    PolicyName='target-tracking-scaling-policy',
    ServiceNamespace='sagemaker',
    ResourceId=f'endpoint/{endpoint_name}/variant/AllTraffic',
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    PolicyType='TargetTrackingScaling',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 70.0,  # 70% invocations per instance
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance'
        }
    }
)

Multi-Model Endpoints

Host multiple models on same endpoint:

from sagemaker.multidatamodel import MultiDataModel

mdm = MultiDataModel(
    name='multi-model-endpoint',
    model_data_prefix=f's3://{bucket}/models/',
    image_uri=container_image,
    role=role
)

predictor = mdm.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.xlarge'
)

# Invoke specific model
prediction = predictor.predict(
    data=test_data,
    target_model='model-v2.tar.gz'
)

Savings: Share infrastructure across models

MLOps with SageMaker Pipelines

Automate ML workflows:

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep, CreateModelStep
from sagemaker.workflow.step_collections import RegisterModel

# Data processing step
processing_step = ProcessingStep(
    name='PrepareData',
    processor=processor,
    inputs=[...],
    outputs=[...],
    code='preprocess.py'
)

# Training step
training_step = TrainingStep(
    name='TrainModel',
    estimator=xgboost,
    inputs={
        'train': TrainingInput(
            s3_data=processing_step.properties.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri
        )
    }
)

# Model registration
register_step = RegisterModel(
    name='RegisterModel',
    estimator=xgboost,
    model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=['text/csv'],
    response_types=['text/csv'],
    inference_instances=['ml.t2.medium', 'ml.m5.large'],
    transform_instances=['ml.m5.large'],
    model_package_group_name='ChurnPredictionModels',
    approval_status='PendingManualApproval'
)

# Create pipeline
pipeline = Pipeline(
    name='ChurnPredictionPipeline',
    parameters=[...],
    steps=[processing_step, training_step, register_step]
)

# Execute
pipeline.upsert(role_arn=role)
execution = pipeline.start()

SageMaker Ground Truth

Label training data efficiently:

Built-in Labeling Workflows:

  • Image classification
  • Object detection
  • Text classification
  • Named entity recognition

Custom Workflows:

  • Define custom UI
  • Use your labeling instructions

Active Learning:

  • Auto-labeling with high-confidence predictions
  • Humans label uncertain examples
  • Reduces labeling costs by 40-70%

SageMaker Feature Store

Centralised repository for ML features:

from sagemaker.feature_store.feature_group import FeatureGroup

# Define feature group
feature_group = FeatureGroup(
    name='customer-features',
    sagemaker_session=session
)

# Load feature definitions
feature_group.load_feature_definitions(
    data_frame=features_df
)

# Create in Feature Store
feature_group.create(
    s3_uri=f's3://{bucket}/feature-store',
    record_identifier_name='customer_id',
    event_time_feature_name='event_time',
    role_arn=role,
    enable_online_store=True
)

# Ingest features
feature_group.ingest(
    data_frame=features_df,
    max_workers=3,
    wait=True
)

# Retrieve features for training
training_data = feature_group.athena_query().run(
    query_string="""
        SELECT *
        FROM "customer-features"
        WHERE event_time >= '2025-01-01'
    """,
    output_location=f's3://{bucket}/query-results/'
)

Common Use Cases

Demand Forecasting

Predict inventory needs:

  • Time-series data
  • Seasonality
  • External factors (promotions, weather)
  • DeepAR algorithm

Fraud Detection

Real-time fraud detection:

  • Transaction patterns
  • Anomaly detection
  • Real-time inference
  • Low latency requirements

Personalization

Product recommendations:

  • Collaborative filtering
  • Content-based filtering
  • Hybrid approaches
  • Batch or real-time

Predictive Maintenance

Equipment failure prediction:

  • Sensor data
  • Time-series
  • Classification or regression
  • IoT integration

Best Practices

Experiment Tracking

from sagemaker.experiments import Run

with Run(
    experiment_name='churn-prediction-experiment',
    run_name='xgboost-v1',
    sagemaker_session=session
) as run:
    # Log parameters
    run.log_parameters({
        'max_depth': 5,
        'eta': 0.2,
        'num_round': 100
    })

    # Train model
    xgboost.fit({'train': train_path})

    # Log metrics
    run.log_metric(name='validation:auc', value=0.87)

Model Registry

Version control for models:

from sagemaker.model import Model

model = Model(
    image_uri=container_image,
    model_data=model_artifact_s3_uri,
    role=role
)

model_package = model.register(
    content_types=['text/csv'],
    response_types=['text/csv'],
    inference_instances=['ml.t2.medium'],
    transform_instances=['ml.m5.large'],
    model_package_group_name='ChurnPredictionModels',
    model_metrics={
        'accuracy': {'value': 0.92},
        'auc': {'value': 0.87}
    },
    approval_status='Approved'
)

A/B Testing

Test model versions:

from sagemaker.model_monitor import ModelQualityMonitor

# Deploy two variants
predictor = model_v1.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name='ab-test-endpoint',
    variant_name='variant-v1',
    traffic_weight=50
)

model_v2.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    endpoint_name='ab-test-endpoint',
    variant_name='variant-v2',
    traffic_weight=50
)

# Monitor performance
# Gradually shift traffic to better performer

Security and Compliance

VPC Configuration

Run in your VPC:

from sagemaker.network import NetworkConfig

network_config = NetworkConfig(
    enable_network_isolation=True,
    security_group_ids=['sg-12345678'],
    subnets=['subnet-12345678', 'subnet-87654321']
)

estimator = Estimator(
    ...
    network_config=network_config
)

Encryption

At Rest: KMS encryption for S3, EBS In Transit: TLS for all API calls

IAM Permissions

Least privilege:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": [
            "sagemaker:CreateTrainingJob",
            "sagemaker:CreateModel",
            "sagemaker:CreateEndpoint"
        ],
        "Resource": "*",
        "Condition": {
            "StringEquals": {
                "aws:RequestedRegion": "ap-southeast-2"
            }
        }
    }]
}

Conclusion

Amazon SageMaker provides a complete platform for custom machine learning. For Australian businesses with unique requirements or competitive advantages from custom models, SageMaker offers the tools to build, train, and deploy ML at scale.

While more complex than pre-built AI services, SageMaker’s flexibility and power enable sophisticated ML use cases. Combined with cost optimisation strategies and MLOps best practices, it’s a robust platform for ML-driven innovation.

CloudPoint helps Australian businesses implement SageMaker - from use case definition and data strategy through model development, deployment, and ongoing optimisation. Contact us to discuss your custom ML requirements.


Need Help with SageMaker?

CloudPoint helps Australian businesses implement SageMaker for custom ML solutions. Get in touch to discuss your requirements.

Learn more about our AI Services →