SAP AI Core
The enterprise MLOps and model serving platform on SAP BTP. Manages the full AI lifecycle — from Git-synced training pipelines through auto-scaled inference endpoints — and serves as the execution engine for SAP Generative AI Hub and SAP Joule.
Overview
SAP AI Core is the foundational AI infrastructure service on SAP BTP that provides managed compute for training machine learning models and serving inference endpoints at enterprise scale. It is built on Kubernetes and uses Argo Workflows as the pipeline engine, enabling declarative, Git-driven ML pipelines that teams treat as code.
Multi-tenancy is achieved through Resource Groups — isolated compute, storage, secrets, and model registries per tenant or environment. A single AI Core service instance can host dozens of independent Resource Groups, making it suitable for platform teams building shared ML infrastructure for multiple business units.
All operations are available through the AI API (a unified REST control plane) and the SAP AI SDK for Python and JavaScript. The visual interface — SAP AI Launchpad — provides data scientists with a web GUI over the same capabilities.
Runtime Architecture
AI API + AI Launchpad manage Resource Groups, applications, executions, and deployments via REST. All state is persisted in SAP-managed infrastructure.
Argo Workflow engine runs training pipelines and manages inference serving pods on Kubernetes. Each Resource Group is a Kubernetes namespace.
Foundation model router and Orchestration Service run as a tenant within AI Core. Prompts never leave SAP infrastructure — data sovereignty at the network layer.
Core Concepts
The isolation unit in AI Core. Each Resource Group has dedicated compute quotas, a separate Artefact Store (object storage), its own model registry, and isolated secrets. Prevents all cross-tenant data access.
A registered Git repository containing Argo Workflow YAML templates. AI Core polls the repository (or receives webhooks) and registers all discovered WorkflowTemplate objects as executable pipelines.
An invocation of a registered workflow template with specific input parameters and artefact bindings. Executions run on managed Kubernetes compute and write output artefacts to the Artefact Store.
A versioned file or directory stored in the Artefact Store. Input artefacts (datasets, feature stores) are mounted read-only. Output artefacts (trained models, evaluation reports) are written by executions.
A named parameter set that binds a workflow template to specific input values. Configurations enable reproducible executions — the same template run with different hyperparameters or dataset versions.
A running inference container that exposes a versioned model as an HTTPS REST endpoint. Deployments auto-scale replicas, report health status, and support graceful rolling updates.
Multi-Tenancy via Resource Groups
ML Pipeline Lifecycle
SAP AI Launchpad
SAP AI Launchpad is a separate BTP subscription that provides a visual web interface over the same AI Core control plane. It is designed for data scientists and ML engineers who prefer a GUI for exploring executions, comparing run metrics, monitoring deployments, and inspecting logs — without writing API code.
AI Launchpad Capabilities
- Execution Explorer
- Browse training runs, filter by scenario, compare metric outputs across runs
- Deployment Manager
- Monitor inference deployments, view health status, scale replicas
- Log Viewer
- Real-time and historical log streaming from executions and serving pods
- Model Registry
- Browse versioned models, inspect artefact metadata, promote to deployment
- ML Operations
- Manage configurations, register artefacts, trigger executions via UI
- Gen AI Hub UI
- Test foundation model prompts, manage prompt templates, view token usage
When to Use AI Launchpad vs. AI API
Inference Deployment Lifecycle
AI Core adjusts replica count based on CPU/memory utilisation and request queue depth. Scale-to-zero is supported for dev/test Resource Groups.
Kubernetes liveness and readiness probes. If a deployment enters DEAD state, the restart policy re-schedules the pod automatically.
Deploy a new model version alongside the current one. Route a configurable percentage of traffic to the new version before full cut-over.
SAP AI SDK — Setup
The SAP AI SDK for Python (sap-ai-sdk) wraps the AI API and Gen AI Hub in idiomatic Python clients. It uses the BTP service key (AICORE_SERVICE_KEY) for authentication.
# Install the SAP AI SDK for Python
pip install sap-ai-sdk
# Or install with all optional extras
pip install "sap-ai-sdk[all]"Resource Groups & GitOps Applications
from ai_core_sdk.ai_core_v2_client import AICoreV2Client
# Authenticate using BTP service key (set env vars or pass directly)
client = AICoreV2Client(
base_url="https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2",
auth_url="https://<your-tenant>.authentication.eu10.hana.ondemand.com/oauth/token",
client_id="<client-id>",
client_secret="<client-secret>",
)
# Create a Resource Group for isolated tenant workloads
client.resource_groups.create(resource_group_id="production-team-a")
# Register a Git application (GitOps — pipeline templates sync from repo)
from ai_core_sdk.models import ApplicationBaseData
app = client.applications.create(
application_name="fraud-detection-pipelines",
repository_url="https://github.com/myorg/ai-core-templates",
revision="main",
path="/workflows",
)
print(f"Application synced: {app.application_name}")
# List registered workflow templates (synced from Git)
templates = client.workflow_specs.query(resource_group_id="production-team-a")
for t in templates.resources:
print(f" Pipeline: {t.name} — {t.scenario_id}")Argo Workflow Pipeline Template
Pipeline templates are stored as Argo Workflow YAML in a Git repository. AI Core syncs the repository and registers each WorkflowTemplate as an executable pipeline. The template below defines a training container with GPU support and structured input/output artefact paths.
# workflows/training-pipeline.yaml
# Stored in Git — AI Core syncs this automatically (GitOps)
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: fraud-detection-training
annotations:
scenarios.ai.sap.com/description: "Fraud detection model training pipeline"
scenarios.ai.sap.com/name: "Fraud Detection"
executors.ai.sap.com/v1: '[{"name":"fraud-training","image":"<registry>/fraud-trainer:1.0"}]'
labels.ai.sap.com/version: "1.0.0"
spec:
templates:
- name: fraud-detection-training
inputs:
parameters:
- name: learning_rate
default: "0.001"
- name: batch_size
default: "64"
- name: epochs
default: "50"
artifacts:
- name: training-data
path: /data/train
outputs:
artifacts:
- name: trained-model
path: /output/model
archive:
none: {}
container:
image: "{{workflow.parameters.executors.fraud-training.image}}"
command: [python, train.py]
args:
- "--learning-rate={{inputs.parameters.learning_rate}}"
- "--batch-size={{inputs.parameters.batch_size}}"
- "--epochs={{inputs.parameters.epochs}}"
- "--data-path=/data/train"
- "--output-path=/output/model"
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
nvidia.com/gpu: "1" # Request GPU for trainingscenarios.ai.sap.com and executors.ai.sap.com annotations are SAP AI Core metadata annotations required for template registration. Without them, AI Core ignores the YAML file during sync.Triggering Training Executions
from ai_core_sdk.models import ExecutionCreationRequest, ParameterBinding
# Trigger a training execution
execution = client.execution.create(
resource_group_id="production-team-a",
body=ExecutionCreationRequest(
configuration_id="fraud-detection-v2-config",
),
)
print(f"Execution started: {execution.id} — status: {execution.status}")
# Poll for completion (production use: implement proper event loop / webhook)
import time
while True:
status = client.execution.get(
execution_id=execution.id,
resource_group_id="production-team-a",
)
print(f" Status: {status.status}")
if status.status in ("COMPLETED", "DEAD", "STOPPED"):
break
time.sleep(30)
# Stream execution logs
logs = client.execution.query_logs(
execution_id=execution.id,
resource_group_id="production-team-a",
start="2025-01-01T00:00:00Z",
)
for log in logs.data.result:
print(f"[{log.timestamp}] {log.msg}")Model Deployment & Inference
from ai_core_sdk.models import DeploymentCreationRequest
# Create an inference deployment from a registered model
deployment = client.deployment.create(
resource_group_id="production-team-a",
body=DeploymentCreationRequest(
configuration_id="fraud-detection-serving-config",
),
)
print(f"Deployment ID: {deployment.id}")
# Wait for deployment to reach RUNNING state
import time
while True:
d = client.deployment.get(
deployment_id=deployment.id,
resource_group_id="production-team-a",
)
print(f" Deployment status: {d.status}")
if d.status == "RUNNING":
print(f" Inference URL: {d.deployment_url}")
break
if d.status in ("DEAD", "STOPPED"):
raise RuntimeError(f"Deployment failed: {d.status}")
time.sleep(15)
# Call the inference endpoint
import requests
token = client._get_token() # Reuse SDK token helper
inference_url = f"{d.deployment_url}/v1/models/fraud-detector:predict"
response = requests.post(
inference_url,
headers={
"Authorization": f"Bearer {token}",
"Content-Type": "application/json",
"AI-Resource-Group": "production-team-a",
},
json={
"instances": [
{"amount": 4500.00, "merchant": "ONLINE_RETAIL", "location": "DE"}
]
},
timeout=30,
)
print(response.json())
# { "predictions": [{"fraud_probability": 0.03, "label": "LEGITIMATE"}] }AI API — REST Reference
All SDK operations map directly to REST endpoints. The base URL follows the patternhttps://api.ai.<region>.ml.hana.ondemand.com/v2. Authentication uses a standard OAuth 2.0 client credentials flow against the BTP UAA tenant.
### AI API — REST Examples (curl)
### Base URL: https://api.ai.<region>.ml.hana.ondemand.com/v2
# 1. List resource groups
curl -X GET \
"https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/admin/resourceGroups" \
-H "Authorization: Bearer ${AI_CORE_TOKEN}"
# 2. Create a configuration (links a pipeline template to parameter values)
curl -X POST \
".../v2/lm/configurations" \
-H "Authorization: Bearer ${AI_CORE_TOKEN}" \
-H "AI-Resource-Group: production-team-a" \
-H "Content-Type: application/json" \
-d '{
"name": "fraud-detection-v2-config",
"executableId": "fraud-detection-training",
"scenarioId": "fraud-detection-scenario",
"parameterBindings": [
{ "key": "learning_rate", "value": "0.001" },
{ "key": "batch_size", "value": "64" },
{ "key": "epochs", "value": "100" }
],
"inputArtifactBindings": [
{ "key": "training-data", "artifactId": "<dataset-artefact-id>" }
]
}'
# 3. Trigger execution
curl -X POST \
".../v2/lm/executions" \
-H "Authorization: Bearer ${AI_CORE_TOKEN}" \
-H "AI-Resource-Group: production-team-a" \
-H "Content-Type: application/json" \
-d '{ "configurationId": "<config-id>" }'
# 4. Get execution logs
curl -X GET \
".../v2/lm/executions/<exec-id>/logs?start=2025-01-01T00:00:00Z" \
-H "Authorization: Bearer ${AI_CORE_TOKEN}" \
-H "AI-Resource-Group: production-team-a"
# 5. Create deployment (inference serving)
curl -X POST \
".../v2/lm/deployments" \
-H "Authorization: Bearer ${AI_CORE_TOKEN}" \
-H "AI-Resource-Group: production-team-a" \
-H "Content-Type: application/json" \
-d '{ "configurationId": "<serving-config-id>" }'Supported ML Frameworks & Runtimes
SAP AI Core runs any Docker container — there is no restriction on the ML framework used inside training pipelines or serving containers. The following frameworks are validated by SAP and referenced in official documentation:
BTP Service Connectivity
Training pipelines access HANA Cloud via the BTP Destination Service. The HANA Vector Engine is used for embedding storage and similarity search during RAG pipeline steps.
Feature data and training datasets are consumed from Datasphere via OData APIs or the Datasphere Consumption API. Data preparation pipelines run as Argo Workflow steps.
Training data extraction from SAP backend systems uses the Integration Suite as a data pipeline layer. Real-time inference calls can be initiated by SAP backend events via Integration Suite.
Joule uses AI Core as the execution engine for its custom skills (Joule Studio) and routes LLM calls through the Generative AI Hub tenant hosted on AI Core.
Road Map
Licensing & Commercial Model
AI Core
SAP's MLOps service on SAP BTP — providing infrastructure for AI model training, deployment, serving, and lifecycle management including access to the Generative AI Hub.
CPEA consumption-based: Resource Units for model training/serving, Inference Units for production AI workloads. Storage charged separately.
Generative AI Hub
SAP's curated access point for 20+ foundation models (GPT-4o, Claude, Gemini, Llama, DALL-E, and SAP-specific models) — with data privacy, usage tracking, and SAP context grounding.
Access via SAP AI Core (Standard plan). Token consumption billed per model per 1,000 tokens. All inference processed within SAP-operated infrastructure for data sovereignty.
SAP AI Core — Plan Comparison
Feature | Free TierExploration onlyGenerally Available | StandardCPEA consumption-basedGenerally Available |
|---|---|---|
| Compute & Scale | ||
| Purpose | Development & exploration | Production MLOps workloads |
| Resource Groups | 1 (default) | Unlimited |
| Concurrent Executions | Limited | Based on CPEA quota |
| Inference Deployments | 1 deployment | Unlimited deployments |
| GPU compute | NVIDIA A10G, V100 (by region) | |
| AI Capabilities | ||
| Generative AI Hub access | Limited model access | Full 20+ model catalogue |
| Custom model training (BYOM) | ||
| Model Registry | Basic | Full lifecycle management |
| AI Launchpad | Separate subscription | Separate subscription |
| Enterprise | ||
| SLA | None (best-effort) | SAP standard BTP SLA |
| Data residency controls | SAP-managed defaults | Region-pinned deployment |
| Commercial model | Free (with BTP account) | CPEA (Resource Units + Inference Units + Storage GB) |