New
Foundation Models
GA

SAP Generative AI Hub

SAP's managed gateway to 20+ leading foundation models — hosted on SAP AI Core so prompts and completions never leave SAP's infrastructure. Includes the Orchestration Service, prompt lifecycle management, RAG with HANA Vector Engine, and the SAP AI SDK.

Overview

The SAP Generative AI Hub is a managed service running on SAP AI Core that provides access to third-party large language models (LLMs) and multimodal foundation models through SAP-controlled endpoints. The fundamental enterprise differentiator is data sovereignty: prompts and completions route through SAP infrastructure — customer data is never sent directly to model providers and is not used for model retraining.

Developers access models through an OpenAI-compatible chat completions API, meaning existing LangChain, LlamaIndex, or OpenAI SDK code can target Gen AI Hub with minimal changes. The SAP AI SDK for Python and JavaScript wraps these endpoints in idiomatic client libraries with BTP authentication built in.

The Orchestration Service adds a production-grade layer above raw model calls: Jinja2 prompt templating, automatic RAG grounding via the HANA Vector Engine, PII masking, content filtering, and structured output parsing — enabling enterprise LLM applications without custom middleware.

Data Sovereignty
Prompts routed via SAP-managed endpoints. Data never sent directly to model vendors.
20+ Foundation Models
GPT-4o, Claude, Gemini, Llama, Mistral — unified API, zero integration effort.
Orchestration Service
Templating, RAG grounding, PII masking, content filtering in a declarative pipeline.
HANA Vector Engine RAG
Native vector similarity search in SAP HANA Cloud for grounded, fact-based responses.
Prompt Management
Version, evaluate, and A/B test prompts. Lifecycle from draft to production.
SAP AI SDK
Python (sap-ai-sdk) and JavaScript (@sap-ai-sdk) with BTP auth and LangChain compatibility.

Architecture & Data Sovereignty

SAP Generative AI Hub — Architecture and Data Flow
Rendering diagram…
Data sovereignty guarantee: All inference requests from SAP applications route through SAP-operated proxy endpoints on AI Core. The model provider never sees the raw request — SAP acts as the data controller, and prompts/completions are covered by SAP's DPA (Data Processing Agreement). No customer data is used for model fine-tuning by the model vendors.

Available Foundation Models

ModelProviderTypeContextBest ForStatus
gpt-4oOpenAI / AzureChat + Vision128KComplex reasoning, code generation, multimodalGA
gpt-4o-miniOpenAI / AzureChat128KHigh-volume, cost-efficient tasksGA
claude-3-5-sonnetAnthropicChat + Vision200KLong document analysis, instruction followingGA
claude-3-haikuAnthropicChat200KLow-latency chatbots, high-throughput tasksGA
gemini-1.5-proGoogleChat + Vision1MVery long context, multimodal workflowsGA
llama-3-70bMeta (open weights)Chat128KOn-premise candidate, fine-tuning with BYOMGA
llama-3-8bMeta (open weights)Chat128KEdge deployment, low-latency servingGA
mistral-largeMistral AIChat128KEuropean data residency, instruction followingGA
text-embedding-3-largeOpenAI / AzureEmbedding8KRAG indexing, semantic search (3072 dimensions)GA
text-embedding-ada-002OpenAI / AzureEmbedding8KLightweight RAG (1536 dimensions)GA
dall-e-3OpenAI / AzureImage GenerationN/ADocument illustration, training material generationGA
claude-3-opusAnthropicChat + Vision200KHighest capability for complex multi-step reasoningPlanned

Model availability varies by BTP region and SAP AI Core plan. The model catalogue is continuously updated. Verify current availability in the Gen AI Hub model catalogue in your BTP subaccount. Planned items are on the SAP Road Map and not yet Generally Available.

Orchestration Service

Orchestration Service — Pipeline Stages
Rendering diagram…
Template Engine (Jinja2)
Renders Jinja2 prompt templates with runtime variable substitution. Supports few-shot example injection, conditional blocks, and loop constructs. Templates are version-controlled in the Prompt Management store.
Grounding Module (RAG)
Before calling the LLM, the Grounding Module embeds the user query, performs approximate nearest-neighbour search in the HANA Vector Engine, and injects the top-K relevant document chunks into the prompt context.
Input / Output Masking
PII detection and anonymisation applied to the prompt before sending to the LLM. De-anonymisation maps PII back to original values in the final response. Prevents customer data from appearing in model logs.
LLM Call
Routes to the configured Gen AI Hub model. Handles retry on transient errors, rate-limit back-off, token budget enforcement, and optional streaming. Model selection can be dynamically overridden per request.
Content Filter
Azure AI Content Safety and SAP-specific safety classifiers applied to both input and output. Configurable harm category thresholds (Hate, Violence, Self-Harm, Sexual) from 0 (block all) to 6 (allow high severity).
Grounding Groundedness Check
Optional post-processing step that verifies the LLM response is factually grounded in the retrieved context — reducing hallucination risk. Responses failing the groundedness threshold can be flagged or blocked.

Prompt Lifecycle Management

Prompt Template Lifecycle — Draft to Production
Rendering diagram…

Prompt Management Capabilities

Version Control
Every prompt change creates a new version. Previous versions remain accessible for rollback.
Test Console
Interactive UI in AI Launchpad to test prompts against a selected model before promotion.
A/B Testing
Route a percentage of traffic to a new prompt version while the existing version stays live.
Template Variables
Jinja2 syntax: {{?variable}} for required, {{variable|default("fallback")}} for optional.
Few-Shot Examples
Store reusable example lists that are injected into templates at render time.
API Access
Retrieve and render templates programmatically via the Prompt Management REST API.

Prompt Design Best Practices

One task per template
Keep each template focused on a single, well-defined task. Avoid multi-purpose prompts.
System + User separation
Use SystemMessage for persona and constraints. Use UserMessage for the actual task and variables.
Constrain output format
Specify JSON schema, bullet count, or word limit in the system message. Reduces post-processing.
Hallucination guard
Add "Only use information from the provided context. If unsure, say so." to grounded templates.
Version before deploying
Always create a new version when modifying a production template. Never edit in place.

RAG with SAP HANA Cloud Vector Engine

Retrieval-Augmented Generation — Document Ingestion and Query Flow
Rendering diagram…
REAL_VECTOR Column Type

HANA Cloud natively stores vectors in REAL_VECTOR columns with ANN (approximate nearest neighbour) index support. No separate vector database required.

Distance Metrics

HANA Vector Engine supports L2 (Euclidean), Cosine Similarity, and Inner Product distance functions. Cosine similarity is recommended for text embeddings.

Hybrid Search

Combine full-text BM25 keyword search with vector similarity using HANA's CONTAINS() + COSINE_SIMILARITY() for improved recall on short queries.

Recommended embedding model: Use text-embedding-3-large (3072 dimensions, OpenAI via Gen AI Hub) for high-quality semantic search. For HANA Cloud tables, declare the column as REAL_VECTOR(3072). Smaller embeddings (text-embedding-ada-002, 1536 dim) are suitable for high-throughput, lower-cost scenarios.

Python — Chat Completions & Embeddings

The SAP AI SDK exposes Gen AI Hub models through a LangChain-compatible interface. Switching between models requires only a model name change — all models share the same API.

chat_completion.py
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

# The SAP AI SDK wraps the Gen AI Hub in LangChain-compatible clients.
# Set AICORE_SERVICE_KEY environment variable to your BTP service key JSON.

proxy_client = get_proxy_client("gen-ai-hub")

# Use GPT-4o via SAP Gen AI Hub
chat_gpt4o = ChatOpenAI(
    proxy_client=proxy_client,
    proxy_model_name="gpt-4o",
    max_tokens=2000,
    temperature=0.3,
)

response = chat_gpt4o.invoke(
    "Summarise the key changes in SAP S/4HANA 2025 OP regarding Clean Core compliance."
)
print(response.content)

# Switch to Claude 3.5 Sonnet with no code changes — just change the model name
chat_claude = ChatOpenAI(
    proxy_client=proxy_client,
    proxy_model_name="claude-3-5-sonnet",
)

# Streaming response
for chunk in chat_claude.stream(
    "Explain SAP Principal Propagation in ABAP Cloud in 3 bullet points."
):
    print(chunk.content, end="", flush=True)
embeddings.py
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

proxy_client = get_proxy_client("gen-ai-hub")

# Create embeddings using text-embedding-3-large via Gen AI Hub
embeddings = OpenAIEmbeddings(
    proxy_client=proxy_client,
    proxy_model_name="text-embedding-3-large",
)

# Embed a single document
doc_vector = embeddings.embed_query(
    "SAP Clean Core requires all extensions to use released APIs only."
)
print(f"Vector dimension: {len(doc_vector)}")  # 3072

# Embed a batch of documents (for bulk indexing into HANA Vector Engine)
documents = [
    "SAP Joule is the generative AI copilot embedded across SAP applications.",
    "SAP Build Apps provides a no-code drag-and-drop application builder.",
    "The Generative AI Hub runs on SAP AI Core with data residency guarantees.",
]
vectors = embeddings.embed_documents(documents)
print(f"Embedded {len(vectors)} documents, dimension: {len(vectors[0])}")

Python — Vector Search with SAP HANA Cloud

hana_vector_rag.py
import hdbcli.dbapi as hdb
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

# Connect to SAP HANA Cloud
conn = hdb.connect(
    address="<your-hana>.hanacloud.ondemand.com",
    port=443,
    user="DBUSER",
    password="<password>",
    encrypt=True,
)

cursor = conn.cursor()

# Create a table with a REAL_VECTOR column (HANA Vector Engine)
cursor.execute("""
    CREATE COLUMN TABLE SAP_KNOWLEDGE_BASE (
        ID          NVARCHAR(100) PRIMARY KEY,
        CONTENT     NCLOB,
        TOPIC       NVARCHAR(200),
        SOURCE_URL  NVARCHAR(500),
        EMBEDDING   REAL_VECTOR(3072)   -- dimension matches text-embedding-3-large
    )
""")

# Embed and store documents
proxy_client = get_proxy_client("gen-ai-hub")
embedding_model = OpenAIEmbeddings(
    proxy_client=proxy_client,
    proxy_model_name="text-embedding-3-large",
)

documents = [
    {
        "id": "clean-core-001",
        "content": "SAP Clean Core: all extensions must use released APIs. No custom Z-tables in the SAP namespace.",
        "topic": "Clean Core",
        "url": "https://help.sap.com/docs/SAP_S4HANA_ON-PREMISE/2025"
    },
]

for doc in documents:
    vector = embedding_model.embed_query(doc["content"])
    vector_str = "[" + ",".join(str(v) for v in vector) + "]"
    cursor.execute(
        "INSERT INTO SAP_KNOWLEDGE_BASE VALUES (?, ?, ?, ?, TO_REAL_VECTOR(?))",
        (doc["id"], doc["content"], doc["topic"], doc["url"], vector_str)
    )

conn.commit()
print("Documents indexed in HANA Vector Engine")

# Similarity search (cosine distance)
query = "What are the rules for SAP Clean Core extensions?"
query_vector = embedding_model.embed_query(query)
query_vector_str = "[" + ",".join(str(v) for v in query_vector) + "]"

cursor.execute("""
    SELECT TOP 5
        ID, CONTENT, TOPIC,
        COSINE_SIMILARITY(EMBEDDING, TO_REAL_VECTOR(?)) AS SCORE
    FROM SAP_KNOWLEDGE_BASE
    ORDER BY SCORE DESC
""", (query_vector_str,))

results = cursor.fetchall()
for row in results:
    print(f"[{row[3]:.4f}] {row[1][:100]}...")

Python — Orchestration Service (RAG + Filtering)

The Orchestration Service provides a declarative pipeline combining RAG grounding, prompt templates, and content filtering. The example below uses a HANA Vector Engine data repository for automatic context retrieval before the LLM call.

orchestration.py
from gen_ai_hub.orchestration.service import OrchestrationService
from gen_ai_hub.orchestration.models.config import OrchestrationConfig
from gen_ai_hub.orchestration.models.llm import LLM
from gen_ai_hub.orchestration.models.template import Template, TemplateValue
from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage
from gen_ai_hub.orchestration.models.grounding import (
    GroundingModule,
    DocumentGrounding,
    GroundingFilterSearch,
)
from gen_ai_hub.orchestration.models.content_filter import (
    ContentFilter,
    AzureContentFilter,
)

# Orchestration Service — declarative pipeline with RAG + filtering

# 1. Define the LLM
llm = LLM(
    name="gpt-4o",                   # Gen AI Hub model name
    parameters={"max_tokens": 1000, "temperature": 0.2},
)

# 2. Define the prompt template (Jinja2)
template = Template(
    messages=[
        SystemMessage(
            "You are an SAP expert assistant. Answer ONLY using the provided context. "
            "If the answer is not in the context, say 'I do not have information on this topic.'"
        ),
        UserMessage(
            "Context:\n{{?context}}\n\n"
            "Question: {{?question}}\n\n"
            "Answer in clear, structured bullet points."
        ),
    ]
)

# 3. Configure RAG grounding (HANA Vector Engine)
grounding = GroundingModule(
    type="document_grounding_service",
    config=DocumentGrounding(
        input_params=["question"],
        output_param="context",
        filters=[GroundingFilterSearch(id="vector_search", data_repository_type="vector")],
    ),
)

# 4. Content filtering (Azure AI Content Safety)
content_filter = ContentFilter(
    input=AzureContentFilter(
        Hate=4, Violence=4, SelfHarm=4, Sexual=4
    ),
    output=AzureContentFilter(
        Hate=4, Violence=4, SelfHarm=4, Sexual=4
    ),
)

# 5. Assemble the orchestration config
config = OrchestrationConfig(
    llm=llm,
    template=template,
    grounding=grounding,
    content_filters=content_filter,
)

# 6. Run the orchestration pipeline
service = OrchestrationService(api_url="<orchestration-endpoint-url>", config=config)

result = service.run(
    template_values=[
        TemplateValue(name="question", value="How do I implement a BAdI in ABAP Cloud?"),
    ]
)

print(result.orchestration_result.choices[0].message.content)
# Output: Grounded answer citing retrieved knowledge base chunks

Python — Reusable Prompt Templates

prompt_template.py
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI

# Prompt templates are managed in the Gen AI Hub UI and versioned.
# Reference a saved template by its ID and inject runtime variables.

proxy_client = get_proxy_client("gen-ai-hub")
chat = ChatOpenAI(proxy_client=proxy_client, proxy_model_name="gpt-4o")

# Example: PO approval recommendation prompt template (saved in Gen AI Hub)
# Template source (defined in Gen AI Hub UI):
#   "Analyse this purchase order and recommend approve/reject/escalate.
#    PO Number: {{po_number}}
#    Vendor: {{vendor_name}}
#    Amount: {{amount}} {{currency}}
#    Category: {{spend_category}}
#    Requestor: {{requestor_name}}
#    Justification: {{justification}}
#    Respond with: Decision, Risk Level (Low/Medium/High), Reason (max 2 sentences)"

def run_po_approval_template(po_data: dict) -> dict:
    # In production, retrieve the template from Gen AI Hub Prompt Management API
    # and render with the variables. Here shown as a direct implementation.
    prompt = f"""Analyse this purchase order and recommend approve/reject/escalate.
PO Number: {po_data['po_number']}
Vendor: {po_data['vendor_name']}
Amount: {po_data['amount']} {po_data['currency']}
Category: {po_data['spend_category']}
Requestor: {po_data['requestor_name']}
Justification: {po_data['justification']}
Respond with: Decision, Risk Level (Low/Medium/High), Reason (max 2 sentences)"""

    response = chat.invoke(prompt)
    return {"recommendation": response.content, "model": "gpt-4o"}

# Example invocation
result = run_po_approval_template({
    "po_number": "4500012345",
    "vendor_name": "TechSupplies GmbH",
    "amount": "87500",
    "currency": "EUR",
    "spend_category": "IT Hardware",
    "requestor_name": "Jane Doe",
    "justification": "Laptop refresh cycle for Q2 2025 — 50 units Dell Latitude."
})
print(result["recommendation"])

CAP Integration — Generative AI in SAP Applications

CAP services integrate with Gen AI Hub using the @sap-ai-sdk/foundation-models npm package. The service definition uses standard CDS actions, and the implementation calls Gen AI Hub models with full BTP auth handled by the SDK.

srv/ai-service.cds
// File: srv/ai-service.cds
service AiService {
  action analyseDocument(
    documentContent : LargeString,
    analysisType    : String(50)   // 'invoice', 'contract', 'purchase-order'
  ) returns {
    analysis   : LargeString;
    model      : String(100);
    tokensUsed : Integer;
  };

  action summariseInvoice(
    invoiceText  : LargeString,
    maxSentences : Integer default 5
  ) returns {
    summary : LargeString;
  };

  action generateEmailDraft(
    decision  : String(20),    // 'approval', 'rejection', 'escalation'
    poNumber  : String(20),
    vendor    : String(100),
    amount    : String(30),
    reason    : LargeString
  ) returns {
    emailDraft : LargeString;
  };
}
srv/ai-service.js
// CAP service integrating with SAP Generative AI Hub
// File: srv/ai-service.js
'use strict'

const cds = require('@sap/cds')
// SAP AI SDK for JavaScript
const { AiCoreClient } = require('@sap-ai-sdk/ai-api')

// GenAI Hub chat client (OpenAI-compatible interface)
// npm install @sap-ai-sdk/foundation-models
const { AzureOpenAiChatClient } = require('@sap-ai-sdk/foundation-models')

module.exports = class AiService extends cds.ApplicationService {
  async init() {
    // Bind AI service handlers
    this.on('analyseDocument', this.analyseDocument)
    this.on('summariseInvoice', this.summariseInvoice)
    this.on('generateEmailDraft', this.generateEmailDraft)
    await super.init()
  }

  /**
   * Analyse a document and extract structured data.
   * Uses GPT-4o via Gen AI Hub — model name matches Gen AI Hub catalogue.
   */
  async analyseDocument(req) {
    const { documentContent, analysisType } = req.data

    // AzureOpenAiChatClient routes through SAP Gen AI Hub (not directly to Azure)
    const client = new AzureOpenAiChatClient({ modelName: 'gpt-4o' })

    const response = await client.run({
      messages: [
        {
          role: 'system',
          content: 'You are an SAP document processing assistant. Extract structured data.'
        },
        {
          role: 'user',
          content: `Analyse this ${analysisType} document and return JSON:\n\n${documentContent}`
        }
      ],
      response_format: { type: 'json_object' },
      max_tokens: 1500,
      temperature: 0.1,
    })

    return {
      analysis: response.getContent(),
      model: response.data.model,
      tokensUsed: response.data.usage?.total_tokens,
    }
  }

  /**
   * Summarise an invoice using a reusable prompt template.
   */
  async summariseInvoice(req) {
    const { invoiceText, maxSentences } = req.data

    const client = new AzureOpenAiChatClient({ modelName: 'claude-3-5-sonnet' })

    const response = await client.run({
      messages: [
        {
          role: 'user',
          content: `Summarise this invoice in ${maxSentences} sentences. \
Extract: vendor, total amount, currency, due date, key line items.\n\n${invoiceText}`
        }
      ],
      max_tokens: 500,
    })

    return { summary: response.getContent() }
  }

  /**
   * Generate an approval/rejection email draft.
   */
  async generateEmailDraft(req) {
    const { decision, poNumber, vendor, amount, reason } = req.data

    const client = new AzureOpenAiChatClient({ modelName: 'gpt-4o-mini' })

    const response = await client.run({
      messages: [
        {
          role: 'system',
          content: 'You are an SAP procurement assistant. Write professional email drafts.'
        },
        {
          role: 'user',
          content: `Write a ${decision} email for PO ${poNumber} from vendor ${vendor} \
(Amount: ${amount}). Reason: ${reason}. Keep it under 100 words. Professional tone.`
        }
      ],
      max_tokens: 300,
      temperature: 0.4,
    })

    return { emailDraft: response.getContent() }
  }
}
Required npm packages: @sap-ai-sdk/ai-api for AI API control-plane operations, and @sap-ai-sdk/foundation-models for the OpenAI-compatible chat and embedding clients. Both are BTP-auth-aware and use theVCAP_SERVICES binding automatically in Cloud Foundry and Kyma runtimes.

REST API — Direct Integration

The Gen AI Hub exposes an OpenAI-compatible REST API. Any client that supports the OpenAI /chat/completions and /embeddings endpoints can target Gen AI Hub by changing the base URL and adding BTP token authentication.

api-examples.sh
### Gen AI Hub — Chat Completions API (OpenAI-compatible)
### Direct REST call — no SDK required

# Obtain a token from BTP UAA (standard OAuth2 client credentials)
TOKEN=$(curl -s -X POST \
  "https://<tenant>.authentication.eu10.hana.ondemand.com/oauth/token" \
  -d "grant_type=client_credentials" \
  -d "client_id=${AICORE_CLIENT_ID}" \
  -d "client_secret=${AICORE_CLIENT_SECRET}" \
  | jq -r '.access_token')

# Chat Completions — GPT-4o via Gen AI Hub
curl -X POST \
  "https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/<deployment-id>/chat/completions" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      { "role": "system", "content": "You are an SAP technical expert." },
      { "role": "user",   "content": "Explain the difference between ABAP Cloud Tier 1 and Tier 2." }
    ],
    "max_tokens": 500,
    "temperature": 0.3
  }'

# Embeddings endpoint (for RAG indexing)
curl -X POST \
  ".../v2/inference/deployments/<embedding-deployment-id>/embeddings" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-large",
    "input": [
      "SAP Clean Core principles for extensibility",
      "ABAP Cloud released API usage guidelines"
    ]
  }'

Content Filtering & Guardrails

The Orchestration Service integrates Azure AI Content Safety as the default content filter, with SAP-managed configuration. Filters are applied to both the user input (before the LLM call) and the model output (before returning to the caller).

Harm Categories

Hate Speech0 (block all) — 6 (allow high severity)
ViolenceConfigurable threshold per deployment
Self-HarmConfigurable threshold per deployment
Sexual ContentConfigurable threshold per deployment

Additional Guardrails

PII Masking
Automatic PII detection and anonymisation on input and de-anonymisation on output
Groundedness Check
Optional verification that LLM output is supported by the retrieved RAG context
Token Budget
Max token limits enforced per request to control cost and latency
Rate Limiting
Per-resource-group rate limits enforced by AI Core infrastructure
Data Residency
All inference within SAP-operated infrastructure in the deployed BTP region

Enterprise Use Cases

Document Intelligence

Extract structured data from invoices, contracts, and purchase orders. Return typed JSON payable to CAP services or BTP Integration Suite flows.

Internal Knowledge Search

Index SAP Help Portal articles, internal wikis, and process documentation in HANA Vector Engine. Answer employee questions with grounded, cited responses.

Joule Custom Skills

Joule Studio skills call Gen AI Hub directly for NLU, slot-filling, and response generation — backed by custom knowledge sources grounded in company data.

Code Generation & Review

Generate ABAP Cloud code snippets, CAP service templates, or Fiori UI5 views. Review pull requests for Clean Core compliance via automated pipeline steps.

Report Narration (SAC)

SAP Analytics Cloud Smart Insights calls Gen AI Hub to narrate dashboard changes in natural language — explaining variance, trends, and outliers.

Process Automation AI

SAP Build Process Automation workflows call Gen AI Hub to make LLM-powered decisions within BPMN flows — email classification, escalation logic, approval recommendations.

Road Map

Status:Generally AvailablePlannedRoadmapFuture Direction
Generally Available
Generative AI Hub — 20+ models GA
GPT-4o, Claude 3.5, Gemini 1.5, Llama 3, Mistral, embedding models — all Generally Available.
Generally Available
Orchestration Service (RAG + Filtering)
Generally Available. Declarative pipeline with grounding, PII masking, and content filtering.
Generally Available
HANA Vector Engine RAG integration
REAL_VECTOR column type and COSINE_SIMILARITY function — GA in HANA Cloud.
Generally Available
Prompt Lifecycle Management UI
Version, test, and promote prompts via AI Launchpad. Generally Available.
Generally Available
SAP AI SDK (Python + JavaScript)
sap-ai-sdk (pip) and @sap-ai-sdk (npm) — GA with LangChain-compatible interface.
Generally Available
Fine-tuning on custom data (Llama 3)
Bring Your Own Model (BYOM) fine-tuning with LoRA on SAP AI Core. Generally Available.
Generally Available
Groundedness Check in Orchestration
Factual grounding verification — GA as optional pipeline stage.
Planned
Multi-modal inputs (image + text)
Vision inputs via GPT-4o and Gemini Pro Vision. Planned — SAP Road Map.
Planned
Structured Output (JSON Schema enforcement)
Enforce a JSON schema on LLM output via Orchestration Service. Planned — SAP Road Map.
Planned
Streaming via Orchestration Service
Server-sent events (SSE) streaming through the Orchestration pipeline. On the SAP Road Map.
Roadmap
LLM cost analytics dashboard
Per-resource-group token cost attribution and budget alerts. On the SAP Road Map.
Future Direction
Graph RAG (entity + relationship retrieval)
Knowledge graph-based retrieval combining vector search and property graph traversal. Future Direction.

Licensing & Commercial Model

Status:Generally AvailablePlannedRoadmapFuture Direction
AI

Generative AI Hub

Generally Available· GA — 20+ foundation models available; model catalogue continuously updated

SAP's curated access point for 20+ foundation models (GPT-4o, Claude, Gemini, Llama, DALL-E, and SAP-specific models) — with data privacy, usage tracking, and SAP context grounding.

CPEA
TokensInference Units

Access via SAP AI Core (Standard plan). Token consumption billed per model per 1,000 tokens. All inference processed within SAP-operated infrastructure for data sovereignty.

AI

AI Core

Generally Available· GA

SAP's MLOps service on SAP BTP — providing infrastructure for AI model training, deployment, serving, and lifecycle management including access to the Generative AI Hub.

CPEA
Resource UnitsInference UnitsStorage (GB)

CPEA consumption-based: Resource Units for model training/serving, Inference Units for production AI workloads. Storage charged separately.

Generative AI Hub — Access Options

Feature
Free TierVia AI Core FreeGenerally Available
StandardCPEA consumption-basedGenerally Available
Joule BoosterBundled with JouleGenerally Available
Access & Models
Access methodAI API + Gen AI Hub UIAI API + Orchestration + UIPre-configured via Joule entitlement
Models availableLimited subsetFull 20+ model catalogueJoule-specific model routing
Monthly token quotaLimited (exploration)Based on CPEA balanceIncluded in Joule Booster
Orchestration & RAG
Orchestration Service
RAG grounding (HANA)
Prompt ManagementBasicFull lifecycle (version, A/B test)Basic (Joule use cases)
Safety & Governance
Content filteringDefault SAP settingsConfigurable thresholdsSAP-managed defaults
PII maskingAvailable via OrchestrationAvailable via Orchestration
Data residency controlsSAP-managed defaultsRegion-pinnedSAP-managed defaults
Enterprise
Fine-tuning (BYOM)AI Core Standard required
SLANone (best-effort)SAP standard BTP SLAIncluded in Joule SLA
Commercial modelFreeCPEA (per 1K tokens per model)CPEA — Joule Booster bundle
Important: Gen AI Hub token consumption is billed per 1,000 tokens per model and rates differ significantly between models. GPT-4o mini and Llama 3 8B are substantially cheaper than GPT-4o or Claude 3.5 Sonnet. Implement token budget controls in the Orchestration Service for production workloads. Consult your SAP account executive for current CPEA token rates.

SAP Official References