Confidential Computing for LLM Inference: How TEEs and Encryption-in-Use Protect AI Models and Data

When you ask an LLM a question about your medical records, financial history, or proprietary business data, where does that data go? Most systems today process it in plain text-even if it’s encrypted at rest or in transit. That’s the gap. Confidential computing closes it by keeping data encrypted while it’s being used. This isn’t theory. It’s happening right now in hospitals, banks, and government agencies running LLMs on sensitive information without ever exposing it to the host system.

Why Traditional AI Security Isn’t Enough

Most companies think they’re safe if they encrypt data at rest and use TLS for network traffic. But when an LLM runs, it needs to read the input, load the model weights, and generate a response-all in memory. That memory is visible to the operating system, hypervisor, cloud provider, even privileged admins. If someone compromises the host, they can steal your model or your customer data. This is the AI privacy paradox: you need powerful AI on sensitive data, but you can’t risk exposing either.

Enter confidential computing. Instead of trusting software layers, it uses hardware-level isolation. Think of it like a vault inside your server. Only the LLM can open it. Everything else-OS, cloud infrastructure, even root users-is locked out. The technology behind this? Trusted Execution Environments (TEEs).

What Are Trusted Execution Environments (TEEs)?

TEEs are secure areas inside a processor that protect code and data from being viewed or modified by anything outside. They’re not software. They’re built into the silicon. Intel’s TDX, AMD’s SEV-SNP, and NVIDIA’s GPU-based TEEs all do the same thing: create a cryptographically sealed environment where only authorized code can run, and memory is automatically encrypted.

Here’s how it works for LLM inference:

  1. Your encrypted prompt is sent to the cloud.
  2. The TEE proves to the client it’s genuine (remote attestation).
  3. Only then does it decrypt the prompt using a key only it holds.
  4. The LLM runs inside the TEE-model weights, inputs, outputs-all encrypted in memory.
  5. The response is re-encrypted before leaving the secure zone.
No one outside the TEE sees the raw data. Not the cloud provider. Not the host OS. Not even a compromised admin. This is the only way to guarantee data-in-use protection.

NVIDIA’s Breakthrough: GPU TEEs for LLMs

Early TEEs ran on CPUs. Intel SGX and AMD SEV-SNP could handle small models, but they choked on LLMs. Loading a 70B-parameter model into an SGX enclave took over 40 minutes. Performance dropped by 20% or more. That made real-time inference impossible.

NVIDIA changed that in late 2023 with confidential computing on H100 GPUs. The H100 doesn’t just accelerate AI-it protects it. Its TEE encrypts every byte of VRAM, including model weights, intermediate activations, and input prompts. Performance overhead? Just 1-5%. That’s near-native speed with enterprise-grade security.

The H200 and the new Blackwell B200 (released December 2025) take it further. They support 200B+ parameter models with under 3% overhead. This isn’t a feature. It’s becoming a requirement for anyone running large LLMs on sensitive data.

Interlocking crystalline GPU and key in Cubist style, representing mutual attestation for secure model loading.

How Cloud Providers Implement It

AWS, Azure, and Red Hat each took different paths:

  • AWS Nitro Enclaves: Uses lightweight VMs to isolate workloads. Great for CPU-based inference, but lacks native GPU TEE support. You have to build workarounds for high-performance models.
  • Azure Confidential Computing: Adds application-level encryption on top of hardware TEEs. Prompts are encrypted before they even reach the server. The TEE decrypts them, runs inference, and re-encrypts the output. Used by major financial firms for fraud detection.
  • Red Hat OpenShift with CVMs: Brings confidential computing into Kubernetes. You deploy LLMs as containers inside Confidential Virtual Machines. It’s the most cloud-native approach, ideal for teams already using OpenShift.
The real differentiator? Secure model loading. You can’t just upload a model file to a TEE. The enclave must prove it’s authorized. The model provider must prove its key is valid. This is called mutual attestation. NVIDIA and Phala Network pioneered this. Without it, a bad actor could trick the TEE into loading a fake model.

Real-World Use Cases

Healthcare is leading adoption. Leidos deployed AWS Nitro Enclaves for processing patient records. They kept 98.7% accuracy while ensuring PHI stayed encrypted during inference. No more risking HIPAA violations.

European insurers now run confidential LLM assistants for claims processing. One company reduced breach risks by 92% and kept customer satisfaction at 92%. The catch? It took six months to build-three times longer than a standard deployment.

Banks use it for credit scoring on private financial data. Law firms run confidential models to analyze contracts without exposing client documents. Even government agencies are using it for classified document summarization.

The pattern? Regulated industries. Industries where data exposure means fines, lawsuits, or lost trust. Confidential computing isn’t optional anymore-it’s compliance.

Challenges and Limitations

It’s not magic. There are real hurdles:

  • Model size: Loading a 70B+ model into a CPU TEE can take over 40 minutes. GPU TEEs fix this-but only if you have NVIDIA H100s or newer.
  • Complexity: Setting up attestation chains, managing keys, integrating with MLOps pipelines-it’s not plug-and-play. Teams report 8-12 weeks of learning curve.
  • Hardware lock-in: NVIDIA’s solution is the fastest, but it’s not available everywhere. If your cloud region doesn’t have H100s, you’re stuck with slower CPU TEEs.
  • Supply chain risk: Even if your TEE is secure, what if the model was poisoned during training? Or if the key management system is breached?
One financial firm tried Intel SGX for a 30B model. It took 47 minutes to load. Their real-time chatbot became a 5-minute wait. They abandoned it.

Clinician and AI separated by fractured barrier, with encrypted patient data inside a geometric lattice.

What You Need to Get Started

If you’re considering confidential computing for LLM inference, here’s your roadmap:

  1. Choose your hardware: If you need speed and scale, go NVIDIA H100/B200. If you’re stuck on older hardware, use Intel TDX or AMD SEV-SNP with CPU-based TEEs.
  2. Select your platform: Use Azure for integrated encryption, AWS for isolation, or Red Hat for Kubernetes-native deployment.
  3. Design for mutual attestation: Your model provider must authenticate the TEE, and the TEE must authenticate the model. No exceptions.
  4. Integrate with your MLOps pipeline: Automate model deployment, key rotation, and attestation checks. Manual processes break security.
  5. Test performance: Benchmark your model with and without TEE. If overhead exceeds 10%, reconsider your hardware choice.
Most teams start small. Pick one high-risk use case-a customer support bot handling PII, or a document analyzer for legal contracts. Prove it works. Then scale.

The Future Is Confined

By 2026, 65% of enterprise LLM deployments handling sensitive data will use confidential computing. By 2027, that jumps to 90% in healthcare and finance. Gartner and IDC agree: there’s no viable alternative. Encryption-at-rest and TLS aren’t enough. Software-only protections can be bypassed. Only hardware-enforced TEEs deliver verifiable security.

The market is growing fast. It was worth $1.7 billion in late 2025. By 2027, it’ll hit $8.3 billion. Cloud providers are betting billions on it. Startups like Phala Network and Tinfoil Security are building the missing pieces: secure key transfer, model attestation, and orchestration tools.

The next step? Standardization. The Confidential Computing Consortium launched an LLM Working Group in September 2025 to create common APIs for secure model serving. That’ll make adoption easier. But until then, you’re on your own.

Is This Right for You?

Ask yourself:

  • Are you processing personal, financial, medical, or proprietary data with LLMs?
  • Could a data breach cost you millions in fines or reputation?
  • Do you own a model you don’t want competitors to copy?
If you answered yes to any of these, you’re already behind. Confidential computing isn’t a luxury. It’s the baseline for enterprise AI.

Start with one workload. Test it on an H100 instance. Measure the overhead. See the difference. Then expand. The future of secure AI doesn’t run in plain text. It runs in a vault.

What’s the difference between encryption-at-rest and encryption-in-use?

Encryption-at-rest protects data when it’s stored-like on a hard drive. Encryption-in-use protects data while it’s being processed, like when an LLM reads a prompt or generates a response. Confidential computing is the only way to achieve true encryption-in-use for AI workloads.

Can I use confidential computing on any cloud provider?

Yes, but with limits. AWS, Azure, and Google Cloud offer confidential computing services. However, GPU-based confidential computing (the fastest option) is only available on NVIDIA H100 or B200 instances. Not all cloud regions support them yet. Check your provider’s documentation for availability.

Does confidential computing slow down LLMs?

It depends. On CPUs, overhead can be 15-25%. On NVIDIA H100/B200 GPUs, it’s just 1-5%. For most real-world applications, that’s negligible. The trade-off is worth it: you get near-native speed with full data protection.

How do I securely load a large LLM into a TEE?

You need mutual attestation. The TEE must prove it’s authentic to the model provider. The model provider must prove its key is valid to the TEE. Only then is the encrypted model loaded and decrypted inside the secure environment. NVIDIA and Phala Network use this method. Without it, you risk loading a malicious model.

Is confidential computing only for big companies?

No. While early adopters were enterprises, tools like Red Hat OpenShift and Azure’s managed services are making it accessible to mid-sized teams. Start small: protect one sensitive use case. You don’t need a huge team or budget to begin.