LLM Infrastructure and Data Protection
This document describes the infrastructure, data flow, and security architecture of the Large Language Model (LLM) inference services used by the Streamdiver platform. It is intended to provide transparency for data protection officers and security assessors evaluating CLOUD Act exposure, data sovereignty, and the handling of sensitive content during AI processing.
| Document ID | TC-LLM-001 |
| Version | 1.3 |
| Date | 2026-02-16 |
| Scope | Streamdiver LLM Inference Infrastructure |
| Applicable Law | GDPR (EU), öDSG (AT); compatible with BDSG (DE) and FADP/nDSG (CH) |
| Related Documents | TC-CDP-001 — Cryptography & Data Protection |
1. Purpose
Streamdiver operates its own LLM inference infrastructure on dedicated servers in Europe. No external AI services (such as OpenAI, Google, Anthropic, or Microsoft) are used. All inference is performed within Streamdiver's own infrastructure boundary.
2. Architecture Overview
3. Inference Framework
| Property | Details |
|---|---|
| Framework | vLLM |
| API | OpenAI-compatible REST API |
| License | Open Source (Apache 2.0) |
| Developer | vLLM Project (open-source community) |
| Deployment | Self-hosted on dedicated GPU servers |
vLLM is a high-performance open-source inference engine. Streamdiver operates it as a self-hosted service — there is no service relationship, telemetry, or data exchange with any external party.
4. Models
Streamdiver uses state-of-the-art open-source models for all LLM-based processing tasks (summarization, question generation, RAG-based retrieval, entity extraction).
| Property | Details |
|---|---|
| Model Source | Reputable open-source models from established providers (e.g., Meta LLaMA, Mistral AI) |
| Model Hosting | Downloaded and served locally on Streamdiver infrastructure |
| Model Selection | Architecture is model-agnostic; models can be exchanged without platform changes |
| Model Training | No fine-tuning or training on customer data |
| Model Updates | Models are updated periodically to leverage improvements in the open-source ecosystem |
The specific model in use may change as the open-source ecosystem evolves. The architecture is designed to be model-agnostic — the inference API remains stable regardless of the underlying model.
5. Network Isolation
The LLM inference servers are not publicly accessible. They are exclusively reachable within the Streamdiver Tailscale tailnet — a private, encrypted overlay network.
| Property | Details |
|---|---|
| Network | Streamdiver Tailscale tailnet (private overlay) |
| Encryption | WireGuard (ChaCha20-Poly1305, Curve25519 key exchange) |
| Access Control | Identity-based; only authorized Streamdiver services can reach the LLM endpoints |
| Public Exposure | None — no public IP, no public DNS, no internet-facing ports |
| Monitoring | All connections are logged within the tailnet control plane |
No traffic to or from the LLM servers traverses the public internet.
6. Data Flow
6.1 Inbound (Platform → LLM)
- The RAG engine within the Streamdiver platform constructs a prompt (context + query).
- The prompt is sent to the vLLM server via the encrypted Tailscale tunnel.
- The vLLM server processes the prompt in RAM and returns the result.
6.2 Outbound (LLM → External)
None. The LLM server has no outbound internet access. It cannot:
- Send data to any external AI service
- Phone home to any vendor
- Transmit telemetry or usage data
- Access any resource outside the Tailscale tailnet
6.3 Data Lifecycle on the LLM Server
| Phase | Data Location | Duration |
|---|---|---|
| Request received | RAM (prompt + context) | Milliseconds to seconds |
| Inference | GPU VRAM + RAM | Duration of inference |
| Response sent | Transmitted via WireGuard tunnel | Immediate |
| After response | Prompt and context are released from memory | Immediate |
No data is written to disk at any point during inference. The LLM server is stateless — it retains no customer data between requests.
7. Hosting
LLM inference workloads run on dedicated GPU servers provided by multiple European hosting partners. The specific provider is selected based on capacity and workload requirements; all providers meet the same baseline security criteria.
| Property | Details |
|---|---|
| Providers | Exoscale (Switzerland/Austria), Hetzner Online GmbH (Germany), Verda (Finland) |
| Locations | Austria, Germany, Finland |
| Provider Jurisdictions | Swiss, Austrian, German, and Finnish law — all EU/EEA |
| Infrastructure Certification | ISO 27001 |
| Server Type | Dedicated GPU servers (not shared / not multi-tenant) |
| Physical Security | Biometric access control, 24/7 surveillance, individual rack locking |
| CLOUD Act Exposure | None — all providers are European companies with no US ownership |
8. CLOUD Act Assessment
The CLOUD Act applies to US companies. The following assessment covers all layers of the LLM infrastructure:
| Layer | Component | Jurisdiction | CLOUD Act Risk |
|---|---|---|---|
| Hosting Providers | Exoscale, Hetzner, Verda | Switzerland/Austria, Germany, Finland | None |
| Inference Framework | vLLM (open source) | N/A (open source, self-hosted) | None |
| Models | Open-source models | N/A (open source, self-hosted) | None |
| Network | Tailscale | Canada (Tailscale Inc.) | See below |
| Encryption | WireGuard | N/A (open source) | None |
Tailscale Note: Tailscale Inc. is a Canadian company. The Tailscale control plane coordinates key exchange and ACL policies but does not relay or have access to data traffic — all data flows directly between nodes via WireGuard tunnels with end-to-end encryption. Tailscale cannot decrypt the traffic. Even in the event of a theoretical legal request, only connection metadata (which nodes are connected) would be available, not content.
Summary: No US company has access to data processed by or stored on the LLM infrastructure. There is no CLOUD Act exposure.
9. Customer Data Usage
| Question | Answer |
|---|---|
| Is customer data used to train or fine-tune models? | No. Customer data is not used to tune LLM models. |
| Is customer data retained after inference? | No. The LLM server is stateless; data is released from memory immediately after response. |
| Is customer data shared with any third party? | No. No data leaves the Streamdiver infrastructure. |
| Are inference prompts or results logged? | No. No prompts or inference results are written to logs on the LLM server. |