Zum Hauptinhalt springen

LLM Infrastructure and Data Protection

This document describes the infrastructure, data flow, and security architecture of the Large Language Model (LLM) inference services used by the Streamdiver platform. It is intended to provide transparency for data protection officers and security assessors evaluating CLOUD Act exposure, data sovereignty, and the handling of sensitive content during AI processing.

Document Information
Document IDTC-LLM-001
Version1.3
Date2026-02-16
ScopeStreamdiver LLM Inference Infrastructure
Applicable LawGDPR (EU), öDSG (AT); compatible with BDSG (DE) and FADP/nDSG (CH)
Related DocumentsTC-CDP-001 — Cryptography & Data Protection

1. Purpose

Streamdiver operates its own LLM inference infrastructure on dedicated servers in Europe. No external AI services (such as OpenAI, Google, Anthropic, or Microsoft) are used. All inference is performed within Streamdiver's own infrastructure boundary.


2. Architecture Overview


3. Inference Framework

PropertyDetails
FrameworkvLLM
APIOpenAI-compatible REST API
LicenseOpen Source (Apache 2.0)
DevelopervLLM Project (open-source community)
DeploymentSelf-hosted on dedicated GPU servers

vLLM is a high-performance open-source inference engine. Streamdiver operates it as a self-hosted service — there is no service relationship, telemetry, or data exchange with any external party.


4. Models

Streamdiver uses state-of-the-art open-source models for all LLM-based processing tasks (summarization, question generation, RAG-based retrieval, entity extraction).

PropertyDetails
Model SourceReputable open-source models from established providers (e.g., Meta LLaMA, Mistral AI)
Model HostingDownloaded and served locally on Streamdiver infrastructure
Model SelectionArchitecture is model-agnostic; models can be exchanged without platform changes
Model TrainingNo fine-tuning or training on customer data
Model UpdatesModels are updated periodically to leverage improvements in the open-source ecosystem

The specific model in use may change as the open-source ecosystem evolves. The architecture is designed to be model-agnostic — the inference API remains stable regardless of the underlying model.


5. Network Isolation

The LLM inference servers are not publicly accessible. They are exclusively reachable within the Streamdiver Tailscale tailnet — a private, encrypted overlay network.

PropertyDetails
NetworkStreamdiver Tailscale tailnet (private overlay)
EncryptionWireGuard (ChaCha20-Poly1305, Curve25519 key exchange)
Access ControlIdentity-based; only authorized Streamdiver services can reach the LLM endpoints
Public ExposureNone — no public IP, no public DNS, no internet-facing ports
MonitoringAll connections are logged within the tailnet control plane

No traffic to or from the LLM servers traverses the public internet.


6. Data Flow

6.1 Inbound (Platform → LLM)

  1. The RAG engine within the Streamdiver platform constructs a prompt (context + query).
  2. The prompt is sent to the vLLM server via the encrypted Tailscale tunnel.
  3. The vLLM server processes the prompt in RAM and returns the result.

6.2 Outbound (LLM → External)

None. The LLM server has no outbound internet access. It cannot:

  • Send data to any external AI service
  • Phone home to any vendor
  • Transmit telemetry or usage data
  • Access any resource outside the Tailscale tailnet

6.3 Data Lifecycle on the LLM Server

PhaseData LocationDuration
Request receivedRAM (prompt + context)Milliseconds to seconds
InferenceGPU VRAM + RAMDuration of inference
Response sentTransmitted via WireGuard tunnelImmediate
After responsePrompt and context are released from memoryImmediate

No data is written to disk at any point during inference. The LLM server is stateless — it retains no customer data between requests.


7. Hosting

LLM inference workloads run on dedicated GPU servers provided by multiple European hosting partners. The specific provider is selected based on capacity and workload requirements; all providers meet the same baseline security criteria.

PropertyDetails
ProvidersExoscale (Switzerland/Austria), Hetzner Online GmbH (Germany), Verda (Finland)
LocationsAustria, Germany, Finland
Provider JurisdictionsSwiss, Austrian, German, and Finnish law — all EU/EEA
Infrastructure CertificationISO 27001
Server TypeDedicated GPU servers (not shared / not multi-tenant)
Physical SecurityBiometric access control, 24/7 surveillance, individual rack locking
CLOUD Act ExposureNone — all providers are European companies with no US ownership

8. CLOUD Act Assessment

The CLOUD Act applies to US companies. The following assessment covers all layers of the LLM infrastructure:

LayerComponentJurisdictionCLOUD Act Risk
Hosting ProvidersExoscale, Hetzner, VerdaSwitzerland/Austria, Germany, FinlandNone
Inference FrameworkvLLM (open source)N/A (open source, self-hosted)None
ModelsOpen-source modelsN/A (open source, self-hosted)None
NetworkTailscaleCanada (Tailscale Inc.)See below
EncryptionWireGuardN/A (open source)None

Tailscale Note: Tailscale Inc. is a Canadian company. The Tailscale control plane coordinates key exchange and ACL policies but does not relay or have access to data traffic — all data flows directly between nodes via WireGuard tunnels with end-to-end encryption. Tailscale cannot decrypt the traffic. Even in the event of a theoretical legal request, only connection metadata (which nodes are connected) would be available, not content.

Summary: No US company has access to data processed by or stored on the LLM infrastructure. There is no CLOUD Act exposure.


9. Customer Data Usage

QuestionAnswer
Is customer data used to train or fine-tune models?No. Customer data is not used to tune LLM models.
Is customer data retained after inference?No. The LLM server is stateless; data is released from memory immediately after response.
Is customer data shared with any third party?No. No data leaves the Streamdiver infrastructure.
Are inference prompts or results logged?No. No prompts or inference results are written to logs on the LLM server.