New

Foundation Models

SAP Generative AI Hub

SAP's managed gateway to 20+ leading foundation models — hosted on SAP AI Core so prompts and completions never leave SAP's infrastructure. Includes the Orchestration Service, prompt lifecycle management, RAG with HANA Vector Engine, and the SAP AI SDK.

Overview

The SAP Generative AI Hub is a managed service running on SAP AI Core that provides access to third-party large language models (LLMs) and multimodal foundation models through SAP-controlled endpoints. The fundamental enterprise differentiator is data sovereignty: prompts and completions route through SAP infrastructure — customer data is never sent directly to model providers and is not used for model retraining.

Developers access models through an OpenAI-compatible chat completions API, meaning existing LangChain, LlamaIndex, or OpenAI SDK code can target Gen AI Hub with minimal changes. The SAP AI SDK for Python and JavaScript wraps these endpoints in idiomatic client libraries with BTP authentication built in.

The Orchestration Service adds a production-grade layer above raw model calls: Jinja2 prompt templating, automatic RAG grounding via the HANA Vector Engine, PII masking, content filtering, and structured output parsing — enabling enterprise LLM applications without custom middleware.

Data Sovereignty

Prompts routed via SAP-managed endpoints. Data never sent directly to model vendors.

20+ Foundation Models

GPT-4o, Claude, Gemini, Llama, Mistral — unified API, zero integration effort.

Orchestration Service

Templating, RAG grounding, PII masking, content filtering in a declarative pipeline.

HANA Vector Engine RAG

Native vector similarity search in SAP HANA Cloud for grounded, fact-based responses.

Prompt Management

Version, evaluate, and A/B test prompts. Lifecycle from draft to production.

SAP AI SDK

Python (sap-ai-sdk) and JavaScript (@sap-ai-sdk) with BTP auth and LangChain compatibility.

Architecture & Data Sovereignty

SAP Generative AI Hub — Architecture and Data Flow

Rendering diagram…

Data sovereignty guarantee: All inference requests from SAP applications route through SAP-operated proxy endpoints on AI Core. The model provider never sees the raw request — SAP acts as the data controller, and prompts/completions are covered by SAP's DPA (Data Processing Agreement). No customer data is used for model fine-tuning by the model vendors.

Available Foundation Models

Model	Provider	Type	Context	Best For	Status
gpt-4o	OpenAI / Azure	Chat + Vision	128K	Complex reasoning, code generation, multimodal	GA
gpt-4o-mini	OpenAI / Azure	Chat	128K	High-volume, cost-efficient tasks	GA
claude-3-5-sonnet	Anthropic	Chat + Vision	200K	Long document analysis, instruction following	GA
claude-3-haiku	Anthropic	Chat	200K	Low-latency chatbots, high-throughput tasks	GA
gemini-1.5-pro	Google	Chat + Vision	1M	Very long context, multimodal workflows	GA
llama-3-70b	Meta (open weights)	Chat	128K	On-premise candidate, fine-tuning with BYOM	GA
llama-3-8b	Meta (open weights)	Chat	128K	Edge deployment, low-latency serving	GA
mistral-large	Mistral AI	Chat	128K	European data residency, instruction following	GA
text-embedding-3-large	OpenAI / Azure	Embedding	8K	RAG indexing, semantic search (3072 dimensions)	GA
text-embedding-ada-002	OpenAI / Azure	Embedding	8K	Lightweight RAG (1536 dimensions)	GA
dall-e-3	OpenAI / Azure	Image Generation	N/A	Document illustration, training material generation	GA
claude-3-opus	Anthropic	Chat + Vision	200K	Highest capability for complex multi-step reasoning	Planned

Model availability varies by BTP region and SAP AI Core plan. The model catalogue is continuously updated. Verify current availability in the Gen AI Hub model catalogue in your BTP subaccount. Planned items are on the SAP Road Map and not yet Generally Available.

Orchestration Service

Orchestration Service — Pipeline Stages

Rendering diagram…

Template Engine (Jinja2)

Renders Jinja2 prompt templates with runtime variable substitution. Supports few-shot example injection, conditional blocks, and loop constructs. Templates are version-controlled in the Prompt Management store.

Grounding Module (RAG)

Before calling the LLM, the Grounding Module embeds the user query, performs approximate nearest-neighbour search in the HANA Vector Engine, and injects the top-K relevant document chunks into the prompt context.

Input / Output Masking

PII detection and anonymisation applied to the prompt before sending to the LLM. De-anonymisation maps PII back to original values in the final response. Prevents customer data from appearing in model logs.

LLM Call

Routes to the configured Gen AI Hub model. Handles retry on transient errors, rate-limit back-off, token budget enforcement, and optional streaming. Model selection can be dynamically overridden per request.

Content Filter

Azure AI Content Safety and SAP-specific safety classifiers applied to both input and output. Configurable harm category thresholds (Hate, Violence, Self-Harm, Sexual) from 0 (block all) to 6 (allow high severity).

Grounding Groundedness Check

Optional post-processing step that verifies the LLM response is factually grounded in the retrieved context — reducing hallucination risk. Responses failing the groundedness threshold can be flagged or blocked.

Prompt Lifecycle Management

Prompt Template Lifecycle — Draft to Production

Rendering diagram…

Prompt Management Capabilities

Version Control: Every prompt change creates a new version. Previous versions remain accessible for rollback.
Test Console: Interactive UI in AI Launchpad to test prompts against a selected model before promotion.
A/B Testing: Route a percentage of traffic to a new prompt version while the existing version stays live.
Template Variables: Jinja2 syntax: {{?variable}} for required, {{variable|default("fallback")}} for optional.
Few-Shot Examples: Store reusable example lists that are injected into templates at render time.
API Access: Retrieve and render templates programmatically via the Prompt Management REST API.

Prompt Design Best Practices

One task per template

Keep each template focused on a single, well-defined task. Avoid multi-purpose prompts.

System + User separation

Use SystemMessage for persona and constraints. Use UserMessage for the actual task and variables.

Constrain output format

Specify JSON schema, bullet count, or word limit in the system message. Reduces post-processing.

Hallucination guard

Add "Only use information from the provided context. If unsure, say so." to grounded templates.

Version before deploying

Always create a new version when modifying a production template. Never edit in place.

RAG with SAP HANA Cloud Vector Engine

Retrieval-Augmented Generation — Document Ingestion and Query Flow

Rendering diagram…

REAL_VECTOR Column Type

HANA Cloud natively stores vectors in REAL_VECTOR columns with ANN (approximate nearest neighbour) index support. No separate vector database required.

Distance Metrics

HANA Vector Engine supports L2 (Euclidean), Cosine Similarity, and Inner Product distance functions. Cosine similarity is recommended for text embeddings.

Hybrid Search

Combine full-text BM25 keyword search with vector similarity using HANA's CONTAINS() + COSINE_SIMILARITY() for improved recall on short queries.

Recommended embedding model: Use text-embedding-3-large (3072 dimensions, OpenAI via Gen AI Hub) for high-quality semantic search. For HANA Cloud tables, declare the column as REAL_VECTOR(3072). Smaller embeddings (text-embedding-ada-002, 1536 dim) are suitable for high-throughput, lower-cost scenarios.

Python — Chat Completions & Embeddings

The SAP AI SDK exposes Gen AI Hub models through a LangChain-compatible interface. Switching between models requires only a model name change — all models share the same API.

chat_completion.py

from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

# The SAP AI SDK wraps the Gen AI Hub in LangChain-compatible clients.
# Set AICORE_SERVICE_KEY environment variable to your BTP service key JSON.

proxy_client = get_proxy_client("gen-ai-hub")

# Use GPT-4o via SAP Gen AI Hub
chat_gpt4o = ChatOpenAI(
    proxy_client=proxy_client,
    proxy_model_name="gpt-4o",
    max_tokens=2000,
    temperature=0.3,
)

response = chat_gpt4o.invoke(
    "Summarise the key changes in SAP S/4HANA 2025 OP regarding Clean Core compliance."
)
print(response.content)

# Switch to Claude 3.5 Sonnet with no code changes — just change the model name
chat_claude = ChatOpenAI(
    proxy_client=proxy_client,
    proxy_model_name="claude-3-5-sonnet",
)

# Streaming response
for chunk in chat_claude.stream(
    "Explain SAP Principal Propagation in ABAP Cloud in 3 bullet points."
):
    print(chunk.content, end="", flush=True)

embeddings.py

from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

proxy_client = get_proxy_client("gen-ai-hub")

# Create embeddings using text-embedding-3-large via Gen AI Hub
embeddings = OpenAIEmbeddings(
    proxy_client=proxy_client,
    proxy_model_name="text-embedding-3-large",
)

# Embed a single document
doc_vector = embeddings.embed_query(
    "SAP Clean Core requires all extensions to use released APIs only."
)
print(f"Vector dimension: {len(doc_vector)}")  # 3072

# Embed a batch of documents (for bulk indexing into HANA Vector Engine)
documents = [
    "SAP Joule is the generative AI copilot embedded across SAP applications.",
    "SAP Build Apps provides a no-code drag-and-drop application builder.",
    "The Generative AI Hub runs on SAP AI Core with data residency guarantees.",
]
vectors = embeddings.embed_documents(documents)
print(f"Embedded {len(vectors)} documents, dimension: {len(vectors[0])}")

Python — Vector Search with SAP HANA Cloud

hana_vector_rag.py

import hdbcli.dbapi as hdb
from gen_ai_hub.proxy.langchain.openai import OpenAIEmbeddings
from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client

# Connect to SAP HANA Cloud
conn = hdb.connect(
    address="<your-hana>.hanacloud.ondemand.com",
    port=443,
    user="DBUSER",
    password="<password>",
    encrypt=True,
)

cursor = conn.cursor()

# Create a table with a REAL_VECTOR column (HANA Vector Engine)
cursor.execute("""
    CREATE COLUMN TABLE SAP_KNOWLEDGE_BASE (
        ID          NVARCHAR(100) PRIMARY KEY,
        CONTENT     NCLOB,
        TOPIC       NVARCHAR(200),
        SOURCE_URL  NVARCHAR(500),
        EMBEDDING   REAL_VECTOR(3072)   -- dimension matches text-embedding-3-large
    )
""")

# Embed and store documents
proxy_client = get_proxy_client("gen-ai-hub")
embedding_model = OpenAIEmbeddings(
    proxy_client=proxy_client,
    proxy_model_name="text-embedding-3-large",
)

documents = [
    {
        "id": "clean-core-001",
        "content": "SAP Clean Core: all extensions must use released APIs. No custom Z-tables in the SAP namespace.",
        "topic": "Clean Core",
        "url": "https://help.sap.com/docs/SAP_S4HANA_ON-PREMISE/2025"
    },
]

for doc in documents:
    vector = embedding_model.embed_query(doc["content"])
    vector_str = "[" + ",".join(str(v) for v in vector) + "]"
    cursor.execute(
        "INSERT INTO SAP_KNOWLEDGE_BASE VALUES (?, ?, ?, ?, TO_REAL_VECTOR(?))",
        (doc["id"], doc["content"], doc["topic"], doc["url"], vector_str)
    )

conn.commit()
print("Documents indexed in HANA Vector Engine")

# Similarity search (cosine distance)
query = "What are the rules for SAP Clean Core extensions?"
query_vector = embedding_model.embed_query(query)
query_vector_str = "[" + ",".join(str(v) for v in query_vector) + "]"

cursor.execute("""
    SELECT TOP 5
        ID, CONTENT, TOPIC,
        COSINE_SIMILARITY(EMBEDDING, TO_REAL_VECTOR(?)) AS SCORE
    FROM SAP_KNOWLEDGE_BASE
    ORDER BY SCORE DESC
""", (query_vector_str,))

results = cursor.fetchall()
for row in results:
    print(f"[{row[3]:.4f}] {row[1][:100]}...")

Python — Orchestration Service (RAG + Filtering)

The Orchestration Service provides a declarative pipeline combining RAG grounding, prompt templates, and content filtering. The example below uses a HANA Vector Engine data repository for automatic context retrieval before the LLM call.

orchestration.py

from gen_ai_hub.orchestration.service import OrchestrationService
from gen_ai_hub.orchestration.models.config import OrchestrationConfig
from gen_ai_hub.orchestration.models.llm import LLM
from gen_ai_hub.orchestration.models.template import Template, TemplateValue
from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage
from gen_ai_hub.orchestration.models.grounding import (
    GroundingModule,
    DocumentGrounding,
    GroundingFilterSearch,
)
from gen_ai_hub.orchestration.models.content_filter import (
    ContentFilter,
    AzureContentFilter,
)

# Orchestration Service — declarative pipeline with RAG + filtering

# 1. Define the LLM
llm = LLM(
    name="gpt-4o",                   # Gen AI Hub model name
    parameters={"max_tokens": 1000, "temperature": 0.2},
)

# 2. Define the prompt template (Jinja2)
template = Template(
    messages=[
        SystemMessage(
            "You are an SAP expert assistant. Answer ONLY using the provided context. "
            "If the answer is not in the context, say 'I do not have information on this topic.'"
        ),
        UserMessage(
            "Context:\n{{?context}}\n\n"
            "Question: {{?question}}\n\n"
            "Answer in clear, structured bullet points."
        ),
    ]
)

# 3. Configure RAG grounding (HANA Vector Engine)
grounding = GroundingModule(
    type="document_grounding_service",
    config=DocumentGrounding(
        input_params=["question"],
        output_param="context",
        filters=[GroundingFilterSearch(id="vector_search", data_repository_type="vector")],
    ),
)

# 4. Content filtering (Azure AI Content Safety)
content_filter = ContentFilter(
    input=AzureContentFilter(
        Hate=4, Violence=4, SelfHarm=4, Sexual=4
    ),
    output=AzureContentFilter(
        Hate=4, Violence=4, SelfHarm=4, Sexual=4
    ),
)

# 5. Assemble the orchestration config
config = OrchestrationConfig(
    llm=llm,
    template=template,
    grounding=grounding,
    content_filters=content_filter,
)

# 6. Run the orchestration pipeline
service = OrchestrationService(api_url="<orchestration-endpoint-url>", config=config)

result = service.run(
    template_values=[
        TemplateValue(name="question", value="How do I implement a BAdI in ABAP Cloud?"),
    ]
)

print(result.orchestration_result.choices[0].message.content)
# Output: Grounded answer citing retrieved knowledge base chunks

Python — Reusable Prompt Templates

prompt_template.py

from gen_ai_hub.proxy.core.proxy_clients import get_proxy_client
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI

# Prompt templates are managed in the Gen AI Hub UI and versioned.
# Reference a saved template by its ID and inject runtime variables.

proxy_client = get_proxy_client("gen-ai-hub")
chat = ChatOpenAI(proxy_client=proxy_client, proxy_model_name="gpt-4o")

# Example: PO approval recommendation prompt template (saved in Gen AI Hub)
# Template source (defined in Gen AI Hub UI):
#   "Analyse this purchase order and recommend approve/reject/escalate.
#    PO Number: {{po_number}}
#    Vendor: {{vendor_name}}
#    Amount: {{amount}} {{currency}}
#    Category: {{spend_category}}
#    Requestor: {{requestor_name}}
#    Justification: {{justification}}
#    Respond with: Decision, Risk Level (Low/Medium/High), Reason (max 2 sentences)"

def run_po_approval_template(po_data: dict) -> dict:
    # In production, retrieve the template from Gen AI Hub Prompt Management API
    # and render with the variables. Here shown as a direct implementation.
    prompt = f"""Analyse this purchase order and recommend approve/reject/escalate.
PO Number: {po_data['po_number']}
Vendor: {po_data['vendor_name']}
Amount: {po_data['amount']} {po_data['currency']}
Category: {po_data['spend_category']}
Requestor: {po_data['requestor_name']}
Justification: {po_data['justification']}
Respond with: Decision, Risk Level (Low/Medium/High), Reason (max 2 sentences)"""

    response = chat.invoke(prompt)
    return {"recommendation": response.content, "model": "gpt-4o"}

# Example invocation
result = run_po_approval_template({
    "po_number": "4500012345",
    "vendor_name": "TechSupplies GmbH",
    "amount": "87500",
    "currency": "EUR",
    "spend_category": "IT Hardware",
    "requestor_name": "Jane Doe",
    "justification": "Laptop refresh cycle for Q2 2025 — 50 units Dell Latitude."
})
print(result["recommendation"])

CAP Integration — Generative AI in SAP Applications

CAP services integrate with Gen AI Hub using the @sap-ai-sdk/foundation-models npm package. The service definition uses standard CDS actions, and the implementation calls Gen AI Hub models with full BTP auth handled by the SDK.

srv/ai-service.cds

// File: srv/ai-service.cds
service AiService {
  action analyseDocument(
    documentContent : LargeString,
    analysisType    : String(50)   // 'invoice', 'contract', 'purchase-order'
  ) returns {
    analysis   : LargeString;
    model      : String(100);
    tokensUsed : Integer;
  };

  action summariseInvoice(
    invoiceText  : LargeString,
    maxSentences : Integer default 5
  ) returns {
    summary : LargeString;
  };

  action generateEmailDraft(
    decision  : String(20),    // 'approval', 'rejection', 'escalation'
    poNumber  : String(20),
    vendor    : String(100),
    amount    : String(30),
    reason    : LargeString
  ) returns {
    emailDraft : LargeString;
  };
}

srv/ai-service.js

// CAP service integrating with SAP Generative AI Hub
// File: srv/ai-service.js
'use strict'

const cds = require('@sap/cds')
// SAP AI SDK for JavaScript
const { AiCoreClient } = require('@sap-ai-sdk/ai-api')

// GenAI Hub chat client (OpenAI-compatible interface)
// npm install @sap-ai-sdk/foundation-models
const { AzureOpenAiChatClient } = require('@sap-ai-sdk/foundation-models')

module.exports = class AiService extends cds.ApplicationService {
  async init() {
    // Bind AI service handlers
    this.on('analyseDocument', this.analyseDocument)
    this.on('summariseInvoice', this.summariseInvoice)
    this.on('generateEmailDraft', this.generateEmailDraft)
    await super.init()
  }

  /**
   * Analyse a document and extract structured data.
   * Uses GPT-4o via Gen AI Hub — model name matches Gen AI Hub catalogue.
   */
  async analyseDocument(req) {
    const { documentContent, analysisType } = req.data

    // AzureOpenAiChatClient routes through SAP Gen AI Hub (not directly to Azure)
    const client = new AzureOpenAiChatClient({ modelName: 'gpt-4o' })

    const response = await client.run({
      messages: [
        {
          role: 'system',
          content: 'You are an SAP document processing assistant. Extract structured data.'
        },
        {
          role: 'user',
          content: `Analyse this ${analysisType} document and return JSON:\n\n${documentContent}`
        }
      ],
      response_format: { type: 'json_object' },
      max_tokens: 1500,
      temperature: 0.1,
    })

    return {
      analysis: response.getContent(),
      model: response.data.model,
      tokensUsed: response.data.usage?.total_tokens,
    }
  }

  /**
   * Summarise an invoice using a reusable prompt template.
   */
  async summariseInvoice(req) {
    const { invoiceText, maxSentences } = req.data

    const client = new AzureOpenAiChatClient({ modelName: 'claude-3-5-sonnet' })

    const response = await client.run({
      messages: [
        {
          role: 'user',
          content: `Summarise this invoice in ${maxSentences} sentences. \
Extract: vendor, total amount, currency, due date, key line items.\n\n${invoiceText}`
        }
      ],
      max_tokens: 500,
    })

    return { summary: response.getContent() }
  }

  /**
   * Generate an approval/rejection email draft.
   */
  async generateEmailDraft(req) {
    const { decision, poNumber, vendor, amount, reason } = req.data

    const client = new AzureOpenAiChatClient({ modelName: 'gpt-4o-mini' })

    const response = await client.run({
      messages: [
        {
          role: 'system',
          content: 'You are an SAP procurement assistant. Write professional email drafts.'
        },
        {
          role: 'user',
          content: `Write a ${decision} email for PO ${poNumber} from vendor ${vendor} \
(Amount: ${amount}). Reason: ${reason}. Keep it under 100 words. Professional tone.`
        }
      ],
      max_tokens: 300,
      temperature: 0.4,
    })

    return { emailDraft: response.getContent() }
  }
}

Required npm packages: @sap-ai-sdk/ai-api for AI API control-plane operations, and @sap-ai-sdk/foundation-models for the OpenAI-compatible chat and embedding clients. Both are BTP-auth-aware and use theVCAP_SERVICES binding automatically in Cloud Foundry and Kyma runtimes.

REST API — Direct Integration

The Gen AI Hub exposes an OpenAI-compatible REST API. Any client that supports the OpenAI /chat/completions and /embeddings endpoints can target Gen AI Hub by changing the base URL and adding BTP token authentication.

api-examples.sh

### Gen AI Hub — Chat Completions API (OpenAI-compatible)
### Direct REST call — no SDK required

# Obtain a token from BTP UAA (standard OAuth2 client credentials)
TOKEN=$(curl -s -X POST \
  "https://<tenant>.authentication.eu10.hana.ondemand.com/oauth/token" \
  -d "grant_type=client_credentials" \
  -d "client_id=${AICORE_CLIENT_ID}" \
  -d "client_secret=${AICORE_CLIENT_SECRET}" \
  | jq -r '.access_token')

# Chat Completions — GPT-4o via Gen AI Hub
curl -X POST \
  "https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/<deployment-id>/chat/completions" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      { "role": "system", "content": "You are an SAP technical expert." },
      { "role": "user",   "content": "Explain the difference between ABAP Cloud Tier 1 and Tier 2." }
    ],
    "max_tokens": 500,
    "temperature": 0.3
  }'

# Embeddings endpoint (for RAG indexing)
curl -X POST \
  ".../v2/inference/deployments/<embedding-deployment-id>/embeddings" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "AI-Resource-Group: default" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-large",
    "input": [
      "SAP Clean Core principles for extensibility",
      "ABAP Cloud released API usage guidelines"
    ]
  }'

Content Filtering & Guardrails

The Orchestration Service integrates Azure AI Content Safety as the default content filter, with SAP-managed configuration. Filters are applied to both the user input (before the LLM call) and the model output (before returning to the caller).

Harm Categories

Hate Speech0 (block all) — 6 (allow high severity)

ViolenceConfigurable threshold per deployment

Self-HarmConfigurable threshold per deployment

Sexual ContentConfigurable threshold per deployment

Additional Guardrails

PII Masking: Automatic PII detection and anonymisation on input and de-anonymisation on output
Groundedness Check: Optional verification that LLM output is supported by the retrieved RAG context
Token Budget: Max token limits enforced per request to control cost and latency
Rate Limiting: Per-resource-group rate limits enforced by AI Core infrastructure
Data Residency: All inference within SAP-operated infrastructure in the deployed BTP region

Enterprise Use Cases

Document Intelligence

Extract structured data from invoices, contracts, and purchase orders. Return typed JSON payable to CAP services or BTP Integration Suite flows.

Internal Knowledge Search

Index SAP Help Portal articles, internal wikis, and process documentation in HANA Vector Engine. Answer employee questions with grounded, cited responses.

Joule Custom Skills

Joule Studio skills call Gen AI Hub directly for NLU, slot-filling, and response generation — backed by custom knowledge sources grounded in company data.

Code Generation & Review

Generate ABAP Cloud code snippets, CAP service templates, or Fiori UI5 views. Review pull requests for Clean Core compliance via automated pipeline steps.

Report Narration (SAC)

SAP Analytics Cloud Smart Insights calls Gen AI Hub to narrate dashboard changes in natural language — explaining variance, trends, and outliers.

Process Automation AI

SAP Build Process Automation workflows call Gen AI Hub to make LLM-powered decisions within BPMN flows — email classification, escalation logic, approval recommendations.

Road Map

Status:Generally AvailablePlannedRoadmapFuture Direction

Generally Available

Generative AI Hub — 20+ models GA

GPT-4o, Claude 3.5, Gemini 1.5, Llama 3, Mistral, embedding models — all Generally Available.

Generally Available

Orchestration Service (RAG + Filtering)

Generally Available. Declarative pipeline with grounding, PII masking, and content filtering.

Generally Available

HANA Vector Engine RAG integration

REAL_VECTOR column type and COSINE_SIMILARITY function — GA in HANA Cloud.

Generally Available

Prompt Lifecycle Management UI

Version, test, and promote prompts via AI Launchpad. Generally Available.

Generally Available

SAP AI SDK (Python + JavaScript)

sap-ai-sdk (pip) and @sap-ai-sdk (npm) — GA with LangChain-compatible interface.

Generally Available

Fine-tuning on custom data (Llama 3)

Bring Your Own Model (BYOM) fine-tuning with LoRA on SAP AI Core. Generally Available.

Generally Available

Groundedness Check in Orchestration

Factual grounding verification — GA as optional pipeline stage.

Planned

Multi-modal inputs (image + text)

Vision inputs via GPT-4o and Gemini Pro Vision. Planned — SAP Road Map.

Planned

Structured Output (JSON Schema enforcement)

Enforce a JSON schema on LLM output via Orchestration Service. Planned — SAP Road Map.

Planned

Streaming via Orchestration Service

Server-sent events (SSE) streaming through the Orchestration pipeline. On the SAP Road Map.

Roadmap

LLM cost analytics dashboard

Per-resource-group token cost attribution and budget alerts. On the SAP Road Map.

Future Direction

Graph RAG (entity + relationship retrieval)

Knowledge graph-based retrieval combining vector search and property graph traversal. Future Direction.

Licensing & Commercial Model

Status:Generally AvailablePlannedRoadmapFuture Direction

Generative AI Hub

Generally Available· GA — 20+ foundation models available; model catalogue continuously updated

SAP's curated access point for 20+ foundation models (GPT-4o, Claude, Gemini, Llama, DALL-E, and SAP-specific models) — with data privacy, usage tracking, and SAP context grounding.

CPEA

TokensInference Units

Access via SAP AI Core (Standard plan). Token consumption billed per model per 1,000 tokens. All inference processed within SAP-operated infrastructure for data sovereignty.

Discovery Center Full details

AI Core

Generally Available· GA

SAP's MLOps service on SAP BTP — providing infrastructure for AI model training, deployment, serving, and lifecycle management including access to the Generative AI Hub.

CPEA

Resource UnitsInference UnitsStorage (GB)

CPEA consumption-based: Resource Units for model training/serving, Inference Units for production AI workloads. Storage charged separately.

Discovery Center Full details

Generative AI Hub — Access Options

Feature	Free TierVia AI Core FreeGenerally Available	StandardCPEA consumption-basedGenerally Available	Joule BoosterBundled with JouleGenerally Available
Access & Models
Access method	AI API + Gen AI Hub UI	AI API + Orchestration + UI	Pre-configured via Joule entitlement
Models available	Limited subset	Full 20+ model catalogue	Joule-specific model routing
Monthly token quota	Limited (exploration)	Based on CPEA balance	Included in Joule Booster
Orchestration & RAG
Orchestration Service
RAG grounding (HANA)
Prompt Management	Basic	Full lifecycle (version, A/B test)	Basic (Joule use cases)
Safety & Governance
Content filtering	Default SAP settings	Configurable thresholds	SAP-managed defaults
PII masking		Available via Orchestration	Available via Orchestration
Data residency controls	SAP-managed defaults	Region-pinned	SAP-managed defaults
Enterprise
Fine-tuning (BYOM)		AI Core Standard required
SLA	None (best-effort)	SAP standard BTP SLA	Included in Joule SLA
Commercial model	Free	CPEA (per 1K tokens per model)	CPEA — Joule Booster bundle

Important: Gen AI Hub token consumption is billed per 1,000 tokens per model and rates differ significantly between models. GPT-4o mini and Llama 3 8B are substantially cheaper than GPT-4o or Claude 3.5 Sonnet. Implement token budget controls in the Orchestration Service for production workloads. Consult your SAP account executive for current CPEA token rates.

SAP Official References

SAP Generative AI Hub — Help Portal

Official product documentation including model catalogue, API reference, and orchestration guide.

SAP AI SDK for Python (PyPI)

sap-ai-sdk — Python client library for AI API and Gen AI Hub. Includes LangChain integration.

SAP AI SDK for JavaScript (npm)

@sap-ai-sdk/ai-api and @sap-ai-sdk/foundation-models — TypeScript/JavaScript SDK.

Gen AI Hub on Discovery Center

Service overview, pricing model, available regions, and trial setup.

SAP HANA Cloud Vector Engine

REAL_VECTOR column type, ANN index, and similarity function reference.

SAP AI Core Road Map

Official SAP product road map for AI Core and Gen AI Hub — planned features and timeline.