SAP Generative AI Hub
SAP's managed gateway to 20+ leading foundation models — hosted on SAP AI Core so prompts and completions never leave SAP's infrastructure. Includes the Orchestration Service, prompt lifecycle management, RAG with HANA Vector Engine, and the SAP AI SDK.
Overview
The SAP Generative AI Hub is a managed service running on SAP AI Core that provides access to third-party large language models (LLMs) and multimodal foundation models through SAP-controlled endpoints. The fundamental enterprise differentiator is data sovereignty: prompts and completions route through SAP infrastructure — customer data is never sent directly to model providers and is not used for model retraining.
Developers access models through an OpenAI-compatible chat completions API, meaning existing LangChain, LlamaIndex, or OpenAI SDK code can target Gen AI Hub with minimal changes. The SAP AI SDK for Python and JavaScript wraps these endpoints in idiomatic client libraries with BTP authentication built in.
The Orchestration Service adds a production-grade layer above raw model calls: Jinja2 prompt templating, automatic RAG grounding via the HANA Vector Engine, PII masking, content filtering, and structured output parsing — enabling enterprise LLM applications without custom middleware.
Architecture & Data Sovereignty
Available Foundation Models
| Model | Provider | Type | Context | Best For | Status |
|---|---|---|---|---|---|
| gpt-4o | OpenAI / Azure | Chat + Vision | 128K | Complex reasoning, code generation, multimodal | GA |
| gpt-4o-mini | OpenAI / Azure | Chat | 128K | High-volume, cost-efficient tasks | GA |
| claude-3-5-sonnet | Anthropic | Chat + Vision | 200K | Long document analysis, instruction following | GA |
| claude-3-haiku | Anthropic | Chat | 200K | Low-latency chatbots, high-throughput tasks | GA |
| gemini-1.5-pro | Chat + Vision | 1M | Very long context, multimodal workflows | GA | |
| llama-3-70b | Meta (open weights) | Chat | 128K | On-premise candidate, fine-tuning with BYOM | GA |
| llama-3-8b | Meta (open weights) | Chat | 128K | Edge deployment, low-latency serving | GA |
| mistral-large | Mistral AI | Chat | 128K | European data residency, instruction following | GA |
| text-embedding-3-large | OpenAI / Azure | Embedding | 8K | RAG indexing, semantic search (3072 dimensions) | GA |
| text-embedding-ada-002 | OpenAI / Azure | Embedding | 8K | Lightweight RAG (1536 dimensions) | GA |
| dall-e-3 | OpenAI / Azure | Image Generation | N/A | Document illustration, training material generation | GA |
| claude-3-opus | Anthropic | Chat + Vision | 200K | Highest capability for complex multi-step reasoning | Planned |
Model availability varies by BTP region and SAP AI Core plan. The model catalogue is continuously updated. Verify current availability in the Gen AI Hub model catalogue in your BTP subaccount. Planned items are on the SAP Road Map and not yet Generally Available.
Orchestration Service
Prompt Lifecycle Management
Prompt Management Capabilities
- Version Control
- Every prompt change creates a new version. Previous versions remain accessible for rollback.
- Test Console
- Interactive UI in AI Launchpad to test prompts against a selected model before promotion.
- A/B Testing
- Route a percentage of traffic to a new prompt version while the existing version stays live.
- Template Variables
- Jinja2 syntax: {{?variable}} for required, {{variable|default("fallback")}} for optional.
- Few-Shot Examples
- Store reusable example lists that are injected into templates at render time.
- API Access
- Retrieve and render templates programmatically via the Prompt Management REST API.
Prompt Design Best Practices
RAG with SAP HANA Cloud Vector Engine
HANA Cloud natively stores vectors in REAL_VECTOR columns with ANN (approximate nearest neighbour) index support. No separate vector database required.
HANA Vector Engine supports L2 (Euclidean), Cosine Similarity, and Inner Product distance functions. Cosine similarity is recommended for text embeddings.
Combine full-text BM25 keyword search with vector similarity using HANA's CONTAINS() + COSINE_SIMILARITY() for improved recall on short queries.
text-embedding-3-large (3072 dimensions, OpenAI via Gen AI Hub) for high-quality semantic search. For HANA Cloud tables, declare the column as REAL_VECTOR(3072). Smaller embeddings (text-embedding-ada-002, 1536 dim) are suitable for high-throughput, lower-cost scenarios.Python — Chat Completions & Embeddings
The SAP AI SDK exposes Gen AI Hub models through a LangChain-compatible interface. Switching between models requires only a model name change — all models share the same API.
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
# The SAP AI SDK wraps the Gen AI Hub in LangChain-compatible clients.
# Set AICORE_SERVICE_KEY environment variable to your BTP service key JSON.
proxy_client = get_proxy_client("gen-ai-hub")
# Use GPT-4o via SAP Gen AI Hub
chat_gpt4o = ChatOpenAI(
proxy_client=proxy_client,
proxy_model_name="gpt-4o",
max_tokens=2000,
temperature=0.3,
)
response = chat_gpt4o.invoke(
"Summarise the key changes in SAP S/4HANA 2025 OP regarding Clean Core compliance."
)
print(response.content)
# Switch to Claude 3.5 Sonnet with no code changes — just change the model name
chat_claude = ChatOpenAI(
proxy_client=proxy_client,
proxy_model_name="claude-3-5-sonnet",
)
# Streaming response
for chunk in chat_claude.stream(
"Explain SAP Principal Propagation in ABAP Cloud in 3 bullet points."
):
print(chunk.content, end="", flush=True)from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
proxy_client = get_proxy_client("gen-ai-hub")
# Create embeddings using text-embedding-3-large via Gen AI Hub
embeddings = OpenAIEmbeddings(
proxy_client=proxy_client,
proxy_model_name="text-embedding-3-large",
)
# Embed a single document
doc_vector = embeddings.embed_query(
"SAP Clean Core requires all extensions to use released APIs only."
)
print(f"Vector dimension: {len(doc_vector)}") # 3072
# Embed a batch of documents (for bulk indexing into HANA Vector Engine)
documents = [
"SAP Joule is the generative AI copilot embedded across SAP applications.",
"SAP Build Apps provides a no-code drag-and-drop application builder.",
"The Generative AI Hub runs on SAP AI Core with data residency guarantees.",
]
vectors = embeddings.embed_documents(documents)
print(f"Embedded {len(vectors)} documents, dimension: {len(vectors[0])}")Python — Vector Search with SAP HANA Cloud
import hdbcli.dbapi as hdb
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
# Connect to SAP HANA Cloud
conn = hdb.connect(
address="<your-hana>.hanacloud.ondemand.com",
port=443,
user="DBUSER",
password="<password>",
encrypt=True,
)
cursor = conn.cursor()
# Create a table with a REAL_VECTOR column (HANA Vector Engine)
cursor.execute("""
CREATE COLUMN TABLE SAP_KNOWLEDGE_BASE (
ID NVARCHAR(100) PRIMARY KEY,
CONTENT NCLOB,
TOPIC NVARCHAR(200),
SOURCE_URL NVARCHAR(500),
EMBEDDING REAL_VECTOR(3072) -- dimension matches text-embedding-3-large
)
""")
# Embed and store documents
proxy_client = get_proxy_client("gen-ai-hub")
embedding_model = OpenAIEmbeddings(
proxy_client=proxy_client,
proxy_model_name="text-embedding-3-large",
)
documents = [
{
"id": "clean-core-001",
"content": "SAP Clean Core: all extensions must use released APIs. No custom Z-tables in the SAP namespace.",
"topic": "Clean Core",
"url": "https://help.sap.com/docs/SAP_S4HANA_ON-PREMISE/2025"
},
]
for doc in documents:
vector = embedding_model.embed_query(doc["content"])
vector_str = "[" + ",".join(str(v) for v in vector) + "]"
cursor.execute(
"INSERT INTO SAP_KNOWLEDGE_BASE VALUES (?, ?, ?, ?, TO_REAL_VECTOR(?))",
(doc["id"], doc["content"], doc["topic"], doc["url"], vector_str)
)
conn.commit()
print("Documents indexed in HANA Vector Engine")
# Similarity search (cosine distance)
query = "What are the rules for SAP Clean Core extensions?"
query_vector = embedding_model.embed_query(query)
query_vector_str = "[" + ",".join(str(v) for v in query_vector) + "]"
cursor.execute("""
SELECT TOP 5
ID, CONTENT, TOPIC,
COSINE_SIMILARITY(EMBEDDING, TO_REAL_VECTOR(?)) AS SCORE
FROM SAP_KNOWLEDGE_BASE
ORDER BY SCORE DESC
""", (query_vector_str,))
results = cursor.fetchall()
for row in results:
print(f"[{row[3]:.4f}] {row[1][:100]}...")Python — Orchestration Service (RAG + Filtering)
The Orchestration Service provides a declarative pipeline combining RAG grounding, prompt templates, and content filtering. The example below uses a HANA Vector Engine data repository for automatic context retrieval before the LLM call.
from gen_ai_hub.orchestration.service import OrchestrationService
from gen_ai_hub.orchestration.models.config import OrchestrationConfig
from gen_ai_hub.orchestration.models.llm import LLM
from gen_ai_hub.orchestration.models.template import Template, TemplateValue
from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage
from gen_ai_hub.orchestration.models.grounding import (
GroundingModule,
DocumentGrounding,
GroundingFilterSearch,
)
from gen_ai_hub.orchestration.models.content_filter import (
ContentFilter,
AzureContentFilter,
)
# Orchestration Service — declarative pipeline with RAG + filtering
# 1. Define the LLM
llm = LLM(
name="gpt-4o", # Gen AI Hub model name
parameters={"max_tokens": 1000, "temperature": 0.2},
)
# 2. Define the prompt template (Jinja2)
template = Template(
messages=[
SystemMessage(
"You are an SAP expert assistant. Answer ONLY using the provided context. "
"If the answer is not in the context, say 'I do not have information on this topic.'"
),
UserMessage(
"Context:\n{{?context}}\n\n"
"Question: {{?question}}\n\n"
"Answer in clear, structured bullet points."
),
]
)
# 3. Configure RAG grounding (HANA Vector Engine)
grounding = GroundingModule(
type="document_grounding_service",
config=DocumentGrounding(
input_params=["question"],
output_param="context",
filters=[GroundingFilterSearch(id="vector_search", data_repository_type="vector")],
),
)
# 4. Content filtering (Azure AI Content Safety)
content_filter = ContentFilter(
input=AzureContentFilter(
Hate=4, Violence=4, SelfHarm=4, Sexual=4
),
output=AzureContentFilter(
Hate=4, Violence=4, SelfHarm=4, Sexual=4
),
)
# 5. Assemble the orchestration config
config = OrchestrationConfig(
llm=llm,
template=template,
grounding=grounding,
content_filters=content_filter,
)
# 6. Run the orchestration pipeline
service = OrchestrationService(api_url="<orchestration-endpoint-url>", config=config)
result = service.run(
template_values=[
TemplateValue(name="question", value="How do I implement a BAdI in ABAP Cloud?"),
]
)
print(result.orchestration_result.choices[0].message.content)
# Output: Grounded answer citing retrieved knowledge base chunksPython — Reusable Prompt Templates
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
# Prompt templates are managed in the Gen AI Hub UI and versioned.
# Reference a saved template by its ID and inject runtime variables.
proxy_client = get_proxy_client("gen-ai-hub")
chat = ChatOpenAI(proxy_client=proxy_client, proxy_model_name="gpt-4o")
# Example: PO approval recommendation prompt template (saved in Gen AI Hub)
# Template source (defined in Gen AI Hub UI):
# "Analyse this purchase order and recommend approve/reject/escalate.
# PO Number: {{po_number}}
# Vendor: {{vendor_name}}
# Amount: {{amount}} {{currency}}
# Category: {{spend_category}}
# Requestor: {{requestor_name}}
# Justification: {{justification}}
# Respond with: Decision, Risk Level (Low/Medium/High), Reason (max 2 sentences)"
def run_po_approval_template(po_data: dict) -> dict:
# In production, retrieve the template from Gen AI Hub Prompt Management API
# and render with the variables. Here shown as a direct implementation.
prompt = f"""Analyse this purchase order and recommend approve/reject/escalate.
PO Number: {po_data['po_number']}
Vendor: {po_data['vendor_name']}
Amount: {po_data['amount']} {po_data['currency']}
Category: {po_data['spend_category']}
Requestor: {po_data['requestor_name']}
Justification: {po_data['justification']}
Respond with: Decision, Risk Level (Low/Medium/High), Reason (max 2 sentences)"""
response = chat.invoke(prompt)
return {"recommendation": response.content, "model": "gpt-4o"}
# Example invocation
result = run_po_approval_template({
"po_number": "4500012345",
"vendor_name": "TechSupplies GmbH",
"amount": "87500",
"currency": "EUR",
"spend_category": "IT Hardware",
"requestor_name": "Jane Doe",
"justification": "Laptop refresh cycle for Q2 2025 — 50 units Dell Latitude."
})
print(result["recommendation"])CAP Integration — Generative AI in SAP Applications
CAP services integrate with Gen AI Hub using the @sap-ai-sdk/foundation-models npm package. The service definition uses standard CDS actions, and the implementation calls Gen AI Hub models with full BTP auth handled by the SDK.
// File: srv/ai-service.cds
service AiService {
action analyseDocument(
documentContent : LargeString,
analysisType : String(50) // 'invoice', 'contract', 'purchase-order'
) returns {
analysis : LargeString;
model : String(100);
tokensUsed : Integer;
};
action summariseInvoice(
invoiceText : LargeString,
maxSentences : Integer default 5
) returns {
summary : LargeString;
};
action generateEmailDraft(
decision : String(20), // 'approval', 'rejection', 'escalation'
poNumber : String(20),
vendor : String(100),
amount : String(30),
reason : LargeString
) returns {
emailDraft : LargeString;
};
}// CAP service integrating with SAP Generative AI Hub
// File: srv/ai-service.js
'use strict'
const cds = require('@sap/cds')
// SAP AI SDK for JavaScript
const { AiCoreClient } = require('@sap-ai-sdk/ai-api')
// GenAI Hub chat client (OpenAI-compatible interface)
// npm install @sap-ai-sdk/foundation-models
const { AzureOpenAiChatClient } = require('@sap-ai-sdk/foundation-models')
module.exports = class AiService extends cds.ApplicationService {
async init() {
// Bind AI service handlers
this.on('analyseDocument', this.analyseDocument)
this.on('summariseInvoice', this.summariseInvoice)
this.on('generateEmailDraft', this.generateEmailDraft)
await super.init()
}
/**
* Analyse a document and extract structured data.
* Uses GPT-4o via Gen AI Hub — model name matches Gen AI Hub catalogue.
*/
async analyseDocument(req) {
const { documentContent, analysisType } = req.data
// AzureOpenAiChatClient routes through SAP Gen AI Hub (not directly to Azure)
const client = new AzureOpenAiChatClient({ modelName: 'gpt-4o' })
const response = await client.run({
messages: [
{
role: 'system',
content: 'You are an SAP document processing assistant. Extract structured data.'
},
{
role: 'user',
content: `Analyse this ${analysisType} document and return JSON:\n\n${documentContent}`
}
],
response_format: { type: 'json_object' },
max_tokens: 1500,
temperature: 0.1,
})
return {
analysis: response.getContent(),
model: response.data.model,
tokensUsed: response.data.usage?.total_tokens,
}
}
/**
* Summarise an invoice using a reusable prompt template.
*/
async summariseInvoice(req) {
const { invoiceText, maxSentences } = req.data
const client = new AzureOpenAiChatClient({ modelName: 'claude-3-5-sonnet' })
const response = await client.run({
messages: [
{
role: 'user',
content: `Summarise this invoice in ${maxSentences} sentences. \
Extract: vendor, total amount, currency, due date, key line items.\n\n${invoiceText}`
}
],
max_tokens: 500,
})
return { summary: response.getContent() }
}
/**
* Generate an approval/rejection email draft.
*/
async generateEmailDraft(req) {
const { decision, poNumber, vendor, amount, reason } = req.data
const client = new AzureOpenAiChatClient({ modelName: 'gpt-4o-mini' })
const response = await client.run({
messages: [
{
role: 'system',
content: 'You are an SAP procurement assistant. Write professional email drafts.'
},
{
role: 'user',
content: `Write a ${decision} email for PO ${poNumber} from vendor ${vendor} \
(Amount: ${amount}). Reason: ${reason}. Keep it under 100 words. Professional tone.`
}
],
max_tokens: 300,
temperature: 0.4,
})
return { emailDraft: response.getContent() }
}
}@sap-ai-sdk/ai-api for AI API control-plane operations, and @sap-ai-sdk/foundation-models for the OpenAI-compatible chat and embedding clients. Both are BTP-auth-aware and use theVCAP_SERVICES binding automatically in Cloud Foundry and Kyma runtimes.REST API — Direct Integration
The Gen AI Hub exposes an OpenAI-compatible REST API. Any client that supports the OpenAI /chat/completions and /embeddings endpoints can target Gen AI Hub by changing the base URL and adding BTP token authentication.
### Gen AI Hub — Chat Completions API (OpenAI-compatible)
### Direct REST call — no SDK required
# Obtain a token from BTP UAA (standard OAuth2 client credentials)
TOKEN=$(curl -s -X POST \
"https://<tenant>.authentication.eu10.hana.ondemand.com/oauth/token" \
-d "grant_type=client_credentials" \
-d "client_id=${AICORE_CLIENT_ID}" \
-d "client_secret=${AICORE_CLIENT_SECRET}" \
| jq -r '.access_token')
# Chat Completions — GPT-4o via Gen AI Hub
curl -X POST \
"https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/<deployment-id>/chat/completions" \
-H "Authorization: Bearer ${TOKEN}" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{ "role": "system", "content": "You are an SAP technical expert." },
{ "role": "user", "content": "Explain the difference between ABAP Cloud Tier 1 and Tier 2." }
],
"max_tokens": 500,
"temperature": 0.3
}'
# Embeddings endpoint (for RAG indexing)
curl -X POST \
".../v2/inference/deployments/<embedding-deployment-id>/embeddings" \
-H "Authorization: Bearer ${TOKEN}" \
-H "AI-Resource-Group: default" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-large",
"input": [
"SAP Clean Core principles for extensibility",
"ABAP Cloud released API usage guidelines"
]
}'Content Filtering & Guardrails
The Orchestration Service integrates Azure AI Content Safety as the default content filter, with SAP-managed configuration. Filters are applied to both the user input (before the LLM call) and the model output (before returning to the caller).
Harm Categories
Additional Guardrails
- PII Masking
- Automatic PII detection and anonymisation on input and de-anonymisation on output
- Groundedness Check
- Optional verification that LLM output is supported by the retrieved RAG context
- Token Budget
- Max token limits enforced per request to control cost and latency
- Rate Limiting
- Per-resource-group rate limits enforced by AI Core infrastructure
- Data Residency
- All inference within SAP-operated infrastructure in the deployed BTP region
Enterprise Use Cases
Extract structured data from invoices, contracts, and purchase orders. Return typed JSON payable to CAP services or BTP Integration Suite flows.
Index SAP Help Portal articles, internal wikis, and process documentation in HANA Vector Engine. Answer employee questions with grounded, cited responses.
Joule Studio skills call Gen AI Hub directly for NLU, slot-filling, and response generation — backed by custom knowledge sources grounded in company data.
Generate ABAP Cloud code snippets, CAP service templates, or Fiori UI5 views. Review pull requests for Clean Core compliance via automated pipeline steps.
SAP Analytics Cloud Smart Insights calls Gen AI Hub to narrate dashboard changes in natural language — explaining variance, trends, and outliers.
SAP Build Process Automation workflows call Gen AI Hub to make LLM-powered decisions within BPMN flows — email classification, escalation logic, approval recommendations.
Road Map
Licensing & Commercial Model
Generative AI Hub
SAP's curated access point for 20+ foundation models (GPT-4o, Claude, Gemini, Llama, DALL-E, and SAP-specific models) — with data privacy, usage tracking, and SAP context grounding.
Access via SAP AI Core (Standard plan). Token consumption billed per model per 1,000 tokens. All inference processed within SAP-operated infrastructure for data sovereignty.
AI Core
SAP's MLOps service on SAP BTP — providing infrastructure for AI model training, deployment, serving, and lifecycle management including access to the Generative AI Hub.
CPEA consumption-based: Resource Units for model training/serving, Inference Units for production AI workloads. Storage charged separately.
Generative AI Hub — Access Options
Feature | Free TierVia AI Core FreeGenerally Available | StandardCPEA consumption-basedGenerally Available | Joule BoosterBundled with JouleGenerally Available |
|---|---|---|---|
| Access & Models | |||
| Access method | AI API + Gen AI Hub UI | AI API + Orchestration + UI | Pre-configured via Joule entitlement |
| Models available | Limited subset | Full 20+ model catalogue | Joule-specific model routing |
| Monthly token quota | Limited (exploration) | Based on CPEA balance | Included in Joule Booster |
| Orchestration & RAG | |||
| Orchestration Service | |||
| RAG grounding (HANA) | |||
| Prompt Management | Basic | Full lifecycle (version, A/B test) | Basic (Joule use cases) |
| Safety & Governance | |||
| Content filtering | Default SAP settings | Configurable thresholds | SAP-managed defaults |
| PII masking | Available via Orchestration | Available via Orchestration | |
| Data residency controls | SAP-managed defaults | Region-pinned | SAP-managed defaults |
| Enterprise | |||
| Fine-tuning (BYOM) | AI Core Standard required | ||
| SLA | None (best-effort) | SAP standard BTP SLA | Included in Joule SLA |
| Commercial model | Free | CPEA (per 1K tokens per model) | CPEA — Joule Booster bundle |