Disk Encryption in CVMs

December 12, 2024 at 8:40 pm

How can a untrusted hypervisor load the disk/image of a CVM in a trustworthy way?
How to pass the keys from the trusted user to the VM in a secure way?

This document (Confidential Computing secrets) specifies that:

Confidential Computing (coco) hardware such as AMD SEV (Secure Encrypted Virtualization) allows guest owners to inject secrets into the VMs memory without the host/hypervisor being able to read them. In SEV, secret injection is performed early in the VM launch process, before the guest starts running.

However, it's still not clear to me how the secrets are passed to the VM securely. I guess this can be a bit different for SEV and TDX.

Fortunately, Linux provides a good documentation explaining the details here for SEV.

Notably, there is a command called KVM_SEV_LAUNCH_SECRET, documented like this:

KVM_SEV_LAUNCH_SECRET
The KVM_SEV_LAUNCH_SECRET command can be used by the hypervisor to inject secret data after the measurement has been validated by the guest owner.
Parameters (in): struct kvm_sev_launch_secret
Returns: 0 on success, -negative on error

The structure looks like this:

struct kvm_sev_launch_secret {
        __u64 hdr_uaddr;        /* userspace address containing the packet header */
        __u32 hdr_len;

        __u64 guest_uaddr;      /* the guest memory region where the secret should be injected */
        __u32 guest_len;

        __u64 trans_uaddr;      /* the hypervisor memory region which contains the secret */
        __u32 trans_len;
};

It seems like there is a mechanism for the CVM to take secrets from the host memory region.

Labels and event processes in the asbestos operating system

November 30, 2024 at 9:59 am

Abstract

Asbestos, a new prototype operating system, provides novel labeling and isolation mechanisms that help contain the effects of exploitable software flaws. Applications can express a wide range of policies with Asbestos's kernel-enforced label mechanism, including controls on inter-process communication and system-wide information flow. A new event process abstraction provides lightweight, isolated contexts within a single process, allowing the same process to act on behalf of multiple users while preventing it from leaking any single user's data to any other user. A Web server that uses Asbestos labels to isolate user data requires about 1.5 memory pages per user, demonstrating that additional security can come at an acceptable cost.

P4CONTROL: Line-Rate Cross-Host Attack Prevention via In-Network Information Flow Control Enabled by Programmable Switches and eBPF

November 26, 2024 at 4:35 pm

This is an interesting work. It enforces Decentralized Information Flow Control (DIFC) using network switches and OS (using eBPF). They designed a new language to express DIFC policies and can introduce very low overhead on DIFC enforcement.

Although the project only contains 4K LoC, they've conducted extensive evaluation, and some of the evaluations were on simulated environment.

Questions

As P4Control enforces DIFC at the file level, the tags may not be enough (e.g., only 256 different tags)?

LM Side-channel

November 3, 2024 at 4:39 pm

What Was Your Prompt? A Remote Keylogging Attack on AI Assistants Usenix'24, packet length
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems arxiv, KV cache

Decentralized Information Flow Control

October 22, 2024 at 3:46 pm

Theoretical Foundations

A decentralized model for information flow control

Important Papers

Information flow control for standard OS abstractions

Abstract

Security inspection and testing require experts in security who think like an attacker. Security experts need to know code locations on which to focus their testing and inspection efforts. Since vulnerabilities are rare occurrences, locating vulnerable code locations can be a challenging task. We investigated whether software metrics obtained from source code and development history are discriminative and predictive of vulnerable code locations. If so, security experts can use this prediction to prioritize security inspection and testing efforts. The metrics we investigated fall into three categories: complexity, code churn, and developer activity metrics. We performed two empirical case studies on large, widely used open-source projects: the Mozilla Firefox web browser and the Red Hat Enterprise Linux kernel. The results indicate that 24 of the 28 metrics collected are discriminative of vulnerabilities for both projects. The models using all three types of metrics together predicted over 80 percent of the known vulnerable files with less than 25 percent false positives for both projects. Compared to a random selection of files for inspection and testing, these models would have reduced the number of files and the number of lines of code to inspect or test by over 71 and 28 percent, respectively, for both projects.

Paper

Discussion

This is an interesting work, analyzing factors which can be used as the indicator to predict which file(s) in a project may contain vulnerabilities. But there are still several questions:

Can these conclusions/observations be applied to other projects/languages?
Is there a universal threshold to predict if a source file contain a bug? Or these criteria are just project-wise.
If single factors already do well for prediction, why do we (in practice) consider multiple factors?

Software Debloating

October 4, 2024 at 10:08 pm

Diving into this topic, I found that there are plenty of questions regarding this idea.

Questions

What are the motivations?
- Reducing available gadgets seems like a good motivation, but it's not clear whether this can enhance software security. There seems to be other factors: whether these gadgets are exploitable?
How to measure the effectiveness of debloating?
- This is a more complex question. As 1) how to assure the trimmed software components are not used by users? and 2) how to claim the security is indeed enhanced?
- The metrics seem to be too indirect. For example, reduced #gadgets cannot represent how many exploitable vulnerabilities are fixed.
It seems to me this idea aligns conditional compilation. What are the differences?

Papers

Is Less Really More? Towards Better Metrics for Measuring Security Improvements Realized Through Software Debloating

This work points out that although #gadgets is can be reduced by some of the debloating tools, they sometimes actually introduce more gadgets that enhances the expressiveness of potential attacks. However, it's still not clear to me whether these introduced gadgets are indeed exploitable.

A Broad Comparative Evaluation of Software Debloating Tools

This work is more interesting. Seems like nearly all of the existing tools designed for software debloating are effective neither on security nor performance.

Potential Directions

Maybe LLMs can aid this? Found an interesting paper: LPR: Large Language Models-Aided Program Reduction

Evaluating Fuzz Testing

October 3, 2024 at 6:11 pm

Abstract

Fuzz testing has enjoyed great success at discovering security critical bugs in real software. Recently, researchers have devoted significant effort to devising new fuzzing techniques, strategies, and algorithms. Such new ideas are primarily evaluated experimentally so an important question is: What experimental setup is needed to produce trustworthy results? We surveyed the recent research literature and assessed the experimental evaluations carried out by 32 fuzzing papers. We found problems in every evaluation we considered. We then performed our own extensive experimental evaluation using an existing fuzzer. Our results showed that the general problems we found in existing experimental evaluations can indeed translate to actual wrong or misleading assessments. We conclude with some guidelines that we hope will help improve experimental evaluations of fuzz testing algorithms, making reported results more robust.

Paper

This is a very interesting paper. It points out some "unscientific" aspects in fuzzing, including:

Not unified test suite and various fuzzing targets (also different versions)
Important factors like execution time and seed selection

They suggest using more statistical methodology (e.g., statistical tests) to support one fuzzer beats another.

Interestingly, there is a new SoK paper in 2024 discussing this similar topic: SoK: Prudent Evaluation Practices for Fuzzing.

AddressSanitizer: A fast address sanity checker

October 2, 2024 at 5:07 pm

Abstract

Memory access bugs, including buffer overflows and uses of freed heap memory, remain a serious problem for programming languages like C and C++. Many memory error detectors exist, but most of them are either slow or detect a limited set of bugs, or both.

This paper presents AddressSanitizer, a new memory error detector. Our tool finds out-of-bounds accesses to heap, stack, and global objects, as well as use-after-free bugs. It employs a specialized memory allocator and code instrumentation that is simple enough to be implemented in any compiler, binary translation system, or even in hardware.

AddressSanitizer achieves efficiency without sacrificing comprehensiveness. Its average slowdown is just 73% yet it accurately detects bugs at the point of occurrence. It has found over 300 previously unknown bugs in the Chromium browser and many bugs in other software.

Goods

Tracking of valid memory via shadow mapping
Checks done via instrumentation
Stack/heal object awareness
No FP

Bads

Still mild overhead
Can miss some bugs (redzones might be skipped)

Questions

Why this work is SO influential?

CC Guest GPU VM Setup

October 1, 2024 at 4:33 pm

After Reboot

# On Host
sudo modprobe vfio-pci
sudo sh -c "echo 10de 2331 > /sys/bus/pci/drivers/vfio-pci/new_id"
# Optionally 
sudo python3 ./nvidia_gpu_tools.py --gpu-name=H100 --set-cc-mode=devtools --reset-after-cc-mode-switch
# Launch VM; In VM
sudo nvidia-smi conf-compute -srs 1

Information Flow Tracking for Heterogeneous Compartmentalized Software

September 19, 2024 at 3:37 pm

Paper

Abstract

We are now seeing increased hardware support for improving the security and performance of privilege separation and compartmentalization techniques. Today, developers can benefit from multiple compartmentalization mechanisms such as process-based sandboxes, trusted execution environments (TEEs)/enclaves, and even intra-address space compartments (i.e., intra-process or intra-enclave). We dub such a computing model a “hetero-compartment” environment and observe that existing system stacks still assume single-compartment models (i.e., user space processes), leading to limitations in using, integrating, and monitoring heterogeneous compartments from a security and performance perspective. We introduce Deluminator, a set of OS abstractions and a userspace framework to enable extensible and fine-grained information flow tracking in hetero-compartment environments. Deluminator allows developers to securely use and combine compartments, define security policies over shared system resources, and audit policy violations and perform digital forensics across heterogeneous compartments. We implemented Deluminator on Linux-based ARM and x86-64 platforms, which supports diverse compartment types ranging from processes, SGX enclaves, TrustZone Trusted Apps (TAs), and intra-address space compartments. Our evaluation shows that our kernel and hardware-assisted approach results in a reasonable overhead (on average 7-29%) that makes it suitable for real-world applications.

Machine Learning with Confidential Computing: A Systematization of Knowledge

September 17, 2024 at 5:03 pm

This is an interesting work.

The idea of GPU record and replay combining with TEE can be trace back to 2021 from an arxiv paper: Safe and Practical GPU Acceleration in TrustZone. This should be the arxiv version of the ASPLOS'22 paper GPUReplay: a 50-KB GPU stack for client ML.

SoK: SGX.Fail: How Stuff Gets eXposed

September 17, 2024 at 5:01 pm

Industry Connecting

September 17, 2024 at 4:59 pm

check this.

Attack in Compositional AI Systems

May 8, 2024 at 9:02 pm

Research Papers

Attacks

A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
A First Look at GPT Apps: Landscape and Vulnerability steals system prompt/files
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins
Demystifying RCE Vulnerabilities in LLM-Integrated Apps RCE based on prompts

Defense

SECGPT: An Execution Isolation Architecture for LLM-Based Systems

MISC

Whispers in the Machine: Confidentiality in LLM-integrated Systems

Previous Attack

Jailbreak

Proof Checking/Verification for SMT Solvers

May 2, 2024 at 1:36 pm

Survey

Projects

SMTCoq

Verifiers

Proof Formats

Work Based on Proof Checking

ZKSMT: A VM for Proving SMT Theorems in Zero Knowledge

PCC: Conventional and Foundational

May 2, 2024 at 1:36 pm

Conventional

Foundational

Exploiting Unprotected I/O Operations in AMD’s Secure Encrypted Virtualization

February 16, 2024 at 7:58 pm

USENIX'19 Paper

System

Attacks

Monitoring nPT (nested page table) and gPT (guest page table)
Page faults => a list of accessed pages => find the actual address of the target page
Decryption oracle: move content of the ciphertext to the SWIOTLB
Let it be decrypted

Technical Details

Monitoring/MITM the DMA operations
Pattern matching to find the accessed address in private memory
IOremap to replace ciphertext
QEMU notifies the VM about DMA write. Does it notify the device to read DMA?

Building GPU TEEs using CPU Secure Enclaves with GEVisor

February 8, 2024 at 4:07 pm

SoCC'23

System

Trust TPM & Secure Boot
Security Monitor implemented at VMX-root level, and the model(with implementation) are formally verified.
Enforce access control by trapping instrcuctions

Covered Attacks Surfaces

DMA buffer
MMIO mapping
GPU context

Attack vectors targeting these surfaces:

Malicious peripherals
Concurrent CPU Access (specific to GEVisor)

Question

How to ensure the integrity of Hypercalls? (i.e., protect the ring buffer)
How to prove active attackers would be detected by GEVisor? (and how to define active attackers?)
Why does the formal verification pass for GEVisor with flaws? (or what does pass mean for GEVisor?)

Honeycomb: Secure and Efficient GPU Executions via Static Validation

February 8, 2024 at 1:28 pm

OSDI'23

Huge implementation effort
Type 1 hypervisor
Static analysis (validator-based)
Still need to rely on the driver (but not fully trust it)

SinClave: Hardware-assisted Singletons for TEEs

February 5, 2024 at 4:54 pm

PDF Middleware ’23

Attack

If the software’s configuration that is not reflected in the enclave measurement (§ 3.3) dictates the to-be-loaded program, the adversary can simply start a report server implementation.

Is it correct? Yes, at least for SCONE.

Defense

The basic idea behind our solution is that system software adds an additional page, the instance page (see Figure 5), to the enclave dynamically during enclave construction. This page, in particular, contains an attestation token, and the verifier’s cryptographic identity. The attestation token is unique and generated by the verifier in a previous step.

How can SinClave ensure that such token is securely protected (i.e., not leaked to the adversary)?

Detection: Logic bugs, inconsistency, typo, etc.

October 22, 2024 at 3:40 pm

Semantic Inconsistency

Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code

This is a very interesting paper, maybe the earliest work in this area. The idea is simple: some operations are based on implicit assumptions. For example, dereference implies a valid pointer.

Detecting Missing-Check Bugs via Semantic- and Context-Aware Criticalness and Constraints Inferences PDF

Solely focus on missing checks in Linux kernel.

APISAN: Sanitizing API Usages through Semantic Cross-checking PDF

To find API usage errors, APISAN automatically infers semantic correctness, called semantic beliefs, by analyzing the source code of different uses of the API.

the more API patterns developers use in similar contexts, the more confidence we have about the correct API usage.

This is a probabilistic approach. Symbolic execution can help to construct a "semantic belief" of the outcome (and pre- and post- conditions) of executing an API, and such construction will be more and more precise as there are more samples in the dataset.

This work is based mainly on relaxed symbolic execution.

Scalable and systematic detection of buggy inconsistencies in source code Paper

Analysis of bugs on cloned code. Searching: hash of code snippet, AST level.

Gap between theory and practice: an empirical study of security patches in solidity
Which bugs are missed in code reviews: An empirical study on SmartSHARK dataset

Semantic bugs are mostly missed (51.34%) from the review process, 7 out of 96 are typos.

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
Bug characteristics in open source software

Careless programming causes many bugs in the evaluated software. For example, simple bugs such as typos account for 7.8–15.0 % of semantic bugs in Mozilla, Apache, and the Linux kernel

Automating Code Review Activities by Large-Scale Pre-training

Finding Bugs Using Your Own Code: Detecting Functionally-similar yet Inconsistent Code

February 1, 2024 at 4:01 pm

Applying cluster on code to find inconsistency.

Key insight: Our approach is inspired by the observation that many bugs in software manifest as inconsistencies deviating from their non-buggy counterparts, namely the code snippets that implement the similar logic in the same codebase. Such bugs, regardless of their types, can be detected by identifying functionally-similar yet inconsistent code snippets in the same codebase

Pros

Two-level clustering
Embedding on program dependency graph
Found bugs (at function level) in large projects
Embedding on code structures
Generality

Cons

Need repo-specific training
Literals are removed at the IR level

To abstract Constructs, we preserve only the variable types for each program statement and remove all variable names and versions.

Needs repository-specific configuration of thresholds
Still very high FP

Others

Granularity at the function level

If an inconsistent cluster contains more than a fixed number (e.g., 2) of deviating nodes (i.e., nodes in Table 2), the inconsistency is deprioritized because it is unlikely to be a true inconsistency (i.e., a single inconsistency rarely involves many deviations).

Detecting Misuses of Security APIs: A Systematic Review

December 23, 2023 at 11:23 am

Arxiv PDF

Prompt Engineering

October 25, 2023 at 9:47 pm

Reference

Few Shot

Providing Examples

Chain-of-Thought

Let's think step by step.

Tree of Thoughts

Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is...

Retrieval Augmented Generation

High-level idea: retrieve external source of information.

Step-back Prompting

Paper

RTFM! Automatic Assumption Discovery and Verification Derivation from Library Document for API Misuse Detection

October 24, 2023 at 4:03 pm

Abstract

To use library APIs, a developer is supposed to follow guidance and respect some constraints, which we call integration assumptions (IAs). Violations of these assumptions can have serious consequences, introducing security-critical flaws such as use-after-free, NULL-dereference, and authentication errors. Analyzing a program for compliance with IAs involves significant effort and needs to be automated. A promising direction is to automatically recover IAs from a library document using Natural Language Processing (NLP) and then verify their consistency with the ways APIs are used in a program through code analysis. However, a practical solution along this line needs to overcome several key challenges, particularly the discovery of IAs from loosely formatted documents and interpretation of their informal descriptions to identify complicated constraints (e.g., data-/control-flow relations between different APIs).

In this paper, we present a new technique for automated assumption discovery and verification derivation from library documents. Our approach, called Advance, utilizes a suite of innovations to address those challenges. More specifically, we leverage the observation that IAs tend to express a strong sentiment in emphasizing the importance of a constraint, particularly those security-critical, and utilize a new sentiment analysis model to accurately recover them from loosely formatted documents. These IAs are further processed to identify hidden references to APIs and parameters, through an embedding model, to identify the information-flow relations expected to be followed. Then our approach runs frequent subtree mining to discover the grammatical units in IA sentences that tend to indicate some categories of constraints that could have security implications. These components are mapped to verification code snippets organized in line with the IA sentence's grammatical structure, and can be assembled into verification code executed through CodeQL to discover misuses inside a program. We implemented this design and evaluated it on 5 popular libraries (OpenSSL, SQLite, libpcap, libdbus and libxml2) and 39 real-world applications. Our analysis discovered 193 API misuses, including 139 flaws never reported before.

System

Extract IA from the documents
Identify the functions referred in the IA
Generate CodeQL queries of the target API (mis)uses

Strengthes and Weaknesses

High accuracy and low FP (how?)
Identified a lot of previously-unknown bugs
New dataset
Relies heavily on well-formed documentation?
The FN is not very low
Generating VC takes a long while

When GPT Meets Program Analysis: Towards Intelligent Detection of Smart Contract Logic Vulnerabilities in GPTScan

October 22, 2023 at 2:14 pm

Arxiv Paper

Using LLM (GPT3.5) to discover logic bugs in smart contracts

Questions

How to reduce false positives? Vulnerability confirmation.
Does temperature = 0 really increase reliability? To my understanding, it just makes the answer deterministic. They do “mimic-in-thebackground” prompting, is that a good method? Any evaluations?
How are the Filtering Types determined for different types of vulnerabilities?

Highlights

Importance of Logic Vulnerabilities

The third group of vulnerabilities requires highlevel semantical oracles for detection and is closely related to the business logic. Most of these vulnerabilities are not detectable by existing static analysis tools. This group comprises six main types of vulnerabilities: (S1) price manipulation, (S2) ID-related violations, (S3) erroneous state updates, (S4) atomicity violation, (S5) privilege escalation, and (S6) erroneous accounting.

Use of Cheaper GPT-3.5

Methodology

Challenges

Big code base cannot be directly feed into GPT as a whole => narrow down the scope

Can we break down vulnerability types in a manner that allows GPT, as a generic and intelligent code understanding tool, to recognize them directly from code-level semantics?

Unreliable answers (FP?) => confirm vulnerabilities

Some Gadgets

GPT's Disadvantages (may not be general to LLMs)

However, we found that GPT struggles to comprehend the concept of “before,”

Missing Context Can Cause FP

For these two types, the main reason for the false alarms is that these vulnerabilities require specific triggering conditions involving other related logic, which may not be contained within a single function and its callers or callees.

New Vulnerabilities

GPTScan successfully discovered 9 vulnerabilities from 3 different types, which did not appear in the audit reports of Code4rena

GPT4

We also conducted a preliminary test using GPT-4, but we did not observe a notable improvement, while the cost increased 20 times.

Prompts

Prompt for scenario and property matching

System: You are a smart contract auditor. You will be asked questions related to code properties. You can mimic answering them in the background five times and provide me with the most frequently appearing answer. Furthermore, please strictly adhere to the output format specified in the question; there is no need to explain your answer.

Scenario Matching

Given the following smart contract code, answer the questions below and organize the result in a json format like {"1": "Yes" or "No", "2": "Yes" or "No"}.

"1": [%SCENARIO_1%]?

"2": [%SCENARIO_2%]?

[%CODE%]

Property Matching Does the following smart contract code "[%SCENARIO, PROPERTY%]"? Answer only "Yes" or "No".

[%CODE%]

Prompt for finding related variables/statements.

In this function, which variable holds the value of total minted share or amount? Please answer in a section starts with "VariableA:". In this function, which variable or function holds the total supply/liquidity AND is used by the conditional branch to determine the supply/liquidity is 0? Please answer in a section starts with "VariableB:". In this function, which variable or function holds the value of the deposit/mint/add amount? Please answer in a section starts with "VariableC:". Please answer in the following json format: {"VariableA":{"Variable name":"Description"}, "VariableB":{"Variable name":"Description"}, "VariableC":{"Variable name":"Description"}}

[%CODE%]

LLM (for) Security

October 29, 2023 at 9:19 pm

Meta Research

LLM Security; twitter @llm_sec

Technical Reports

LLM for Security

Fuzzing

Program Repair

Code Analysis

The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models

Test Geenration

MISC

Large Language Models are Pretty Good Zero-Shot Video Game Bug Detectors

Expressing Information Flow Properties

September 6, 2023 at 12:06 pm

Abstract

Industries and governments are increasingly compelled by regulations and public pressure to handle sensitive information responsibly. Regulatory requirements and user expectations may be complex and have subtle implications for the use of data. Information flow properties can express complex restrictions on data usage by specifying how sensitive data (and data derived from sensitive data) may flow throughout computation. Controlling these flows of information according to the appropriate specification can prevent both leakage of confidential information to adversaries and corruption of critical data by adversaries. There is a rich literature expressing information flow properties to describe the complex restrictions on data usage required by today’s digital society. This monograph summarizes how the expressiveness of information flow properties has evolved over the last four decades to handle different threat models, computational models, and conditions that determine whether flows are allowed. In addition to highlighting the significant advances of this area, we identify some remaining problems worthy of further investigation.

Considerations

Theoretical Foundations

Lattice theory
Noninterference
Logic models (e.g., temporal logic)

Threat Models

Termination: does termination leaks high*?
Time: different time
Interaction: input/output flow
Program code

Computational Models

Nondeterminism: the program is not deterministic
Composition of systems: e.g., feedback: the output becomes next input. Composition can lead to nondeterminism
Concurrency

Re(de)classification

Need to consider: what, where, when, and who
Introduce Delimited Release to declassify

Assisting Static Analysis with Large Language Models: A ChatGPT Experiment

September 6, 2023 at 11:49 am

PDF Repo

Abstract

Recent advances of Large Language Models (LLMs), e.g., ChatGPT, exhibited strong capabilities of comprehending and responding to questions across a variety of domains. Surprisingly, ChatGPT even possesses a strong understanding of program code. In this paper, we investigate where and how LLMs can assist static analysis by asking appropriate questions. In particular, we target a specific bug-finding tool, which produces many false positives from the static analysis. In our evaluation, we find that these false positives can be effectively pruned by asking carefully constructed questions about function-level behaviors or function summaries. Specifically, with a pilot study of 20 false positives, we can successfully prune 8 out of 20 based on GPT-3.5, whereas GPT-4 had a near-perfect result of 16 out of 20, where the four failed ones are not currently considered/supported by our questions, e.g., involving concurrency. Additionally, it also identified one false negative case (a missed bug). We find LLMs a promising tool that can enable a more effective and efficient program analysis.

Noticeable Points

Prompt Design

Chain of Thought: Step-by-step result
Task decomposition: breaking down the huge tasks into smaller pieces
Progressive prompt: interactively feed information

Challenges in traditional code analysis

Inherent Knowledge Boundaries.

Static analysis requires domain knowledge to model special functions which cannot be analyzed. E.g., assembly code, hardware behaviors, concurrency, and compiler built-ins.

Exhaustive Path Exploration.

GPU TEE: Potential Problems

September 6, 2023 at 11:31 am

Partial GPU

Passing a partial GPU to a VM using vGPU and SRIOV.

This shared case can be dangerous.

Isolation inside the CVM

How exactly does the GPU access the CVM's memory? Part of the memory is marked as shared, while encrypted.

Constant-Time Foundations for the New Spectre Era

February 27, 2023 at 5:48 pm

Abstract

The constant-time discipline is a software-based countermeasure used for protecting high assurance cryptographic implementations against timing side-channel attacks. Constant-time is effective (it protects against many known attacks), rigorous (it can be formalized using program semantics), and amenable to automated verification. Yet, the advent of micro-architectural attacks makes constant-time as it exists today far less useful.

This paper lays foundations for constant-time programming in the presence of speculative and out-of-order execution. We present an operational semantics and a formal definition of constant-time programs in this extended setting. Our semantics eschews formalization of microarchitectural features (that are instead assumed under adversary control), and yields a notion of constant-time that retains the elegance and tractability of the usual notion. We demonstrate the relevance of our semantics in two ways: First, by contrasting existing Spectre-like attacks with our definition of constant-time. Second, by implementing a static analysis tool, Pitchfork, which detects violations of our extended constant-time property in real world cryptographic libraries.

Methodology

A model to incorporate speculative execution.
Abstract machine: fetch, execute, retire.
Reorder buffer => out-of-order and speculative execution
Directives can be initiated to model the detailed schedule of the reordered micro-ops
Observations are also modeled in such process to mimic the leakage visible to the attacker
Modeled instructions: op, fence, load, store, br, call, ret

Artifact

Symbolic execution based on angr
Schedule the worst case reorder buffer => maximize potential leakage
Can indeed found threat

Comments

Are the threats confirmed by the developers?
Is the model formalized in a proof checker?
Execution time of instructions is not modeled? (or need not to be modeled?)

PrivGuard: Privacy Regulation Compliance Made Easier

March 1, 2023 at 4:58 pm

Paper

Abstract

Continuous compliance with privacy regulations, such as GDPR and CCPA, has become a costly burden for companies from small-sized start-ups to business giants. The culprit is the heavy reliance on human auditing in today's compliance process, which is expensive, slow, and error-prone. To address the issue, we propose PrivGuard, a novel system design that reduces human participation required and improves the productivity of the compliance process. PrivGuard is mainly comprised of two components: (1) PrivAnalyzer, a static analyzer based on abstract interpretation for partly enforcing privacy regulations, and (2) a set of components providing strong security protection on the data throughout its life cycle. To validate the effectiveness of this approach, we prototype PrivGuard and integrate it into an industrial-level data governance platform. Our case studies and evaluation show that PrivGuard can correctly enforce the encoded privacy policies on real-world programs with reasonable performance overhead.

Methodology

Users can prescribe their privacy policies, and the analyst can then leverage user data for data analysis tasks. However, the difference privacy policies are automatically enforced and satisfied by PrivGuard, which is executed inside TEE.

The policy is prescribed in a formal language, and the data analysis program is statically analyzed by PrivAnalyzer to check privacy policy compliance. PrivAnalyzer use python interpreter as a abstract interpreter to check if the privacy policies might be broken by the program. Since the python program may use a lot of 3rd party libraries, the authors purpose functions summaries for these functions and over approximate the result.

Weakness

Intentional information leakage (analyst is assumed trusted)
Language level bypasses

Sidenote

Some papers mentioned in this work is also interesting, especially those related to dealing with loops and branches in static analysis.

This paper may need to be checked again!

Privado: Practical and Secure DNN Inference with Enclaves

March 1, 2023 at 4:57 pm

Paper

Abstract

Cloud providers are extending support for trusted hardware primitives such as Intel SGX. Simultaneously, the field of deep learning is seeing enormous innovation as well as an increase in adoption. In this paper, we ask a timely question: "Can third-party cloud services use Intel SGX enclaves to provide practical, yet secure DNN Inference-as-a-service?" We first demonstrate that DNN models executing inside enclaves are vulnerable to access pattern based attacks. We show that by simply observing access patterns, an attacker can classify encrypted inputs with 97% and 71% attack accuracy for MNIST and CIFAR10 datasets on models trained to achieve 99% and 79% original accuracy respectively. This motivates the need for PRIVADO, a system we have designed for secure, easy-to-use, and performance efficient inference-as-a-service. PRIVADO is input-oblivious: it transforms any deep learning framework that is written in C/C++ to be free of input-dependent access patterns thus eliminating the leakage. PRIVADO is fully-automated and has a low TCB: with zero developer effort, given an ONNX description of a model, it generates compact and enclave-compatible code which can be deployed on an SGX cloud platform. PRIVADO incurs low performance overhead: we use PRIVADO with Torch framework and show its overhead to be 17.18% on average on 11 different contemporary neural networks.

Model

Service: In-enclave ML model. model is unpublic
Users: data providers, input & output are secret

Attack

Infer the output label from memory access trace collected when the user's input is processing.

DNN contains data-dependent branches
A ML model (linear reg) is built up from memory access traces and the output label
Can achieve high accuracy on inferring output tag from memory trace

Defense

Data-dependency usually occurs at activation functions (e.g. ReLU) and max pooling, etc. Other layers merely contains data-dependent memory access.
Eliminate the input/secret-dependency in the Torch library
End-to-end model compilation

Proof Complexity vs. Code Complexity

December 26, 2022 at 9:52 pm

Papers

Potential Threats of Memory Integrity on SEV(SNP), (Scalable) SGX2, and TDX

December 6, 2022 at 12:09 am

SGX2 Memory Integrity

Documents

Potential Attacks

Inside-in Aliasing
Outside-in Aliasing

Possible sources of aliasing

Server’s RAS feature

Memory Address Range Mirroring
Memory Predictive Failure Analysis (PFA)

PFA: if a physical memory page is believed to be affected by an underlying hardware fault (e.g., a weak cell or faulty row in a memory chip or DRAM), the affected page can be retired by relocating its content to another physical page, and placing the retired page on a list of physical pages that should not be subsequently allocated by the virtual memory system.

Documents

Possible attack from OS?

Via Memory Components (System Software)

Program critical system hardware devices, e.g., memory controller, DMA engines (doc1, p113) DMA is controlled by the CPU in x86-64 systems
Program page tables/EPT => inside-in alias (doc1, p113)

Other possible attack

Firmware <= defend by secure boot/PFR/Intel Hardware Shield (doc.2)

Documents

TDX Problems

References

Intel TDX

A Systematic Look at Ciphertext Side Channels on AMD SEV-SNP

November 11, 2022 at 10:29 am

Abstract

Hardware-assisted memory encryption offers strong confidentiality guarantees for trusted execution environments like Intel SGX and AMD SEV. However, a recent study by Li et al. presented at USENIX Security 2021 has demonstrated the CipherLeaks attack, which monitors ciphertext changes in the special VMSA page. By leaking register values saved by the VM during context switches, they broke state-of-the-art constant-time cryptographic implementations, including RSA and ECDSA in the OpenSSL. In this paper, we perform a comprehensive study on the ciphertext side channels. Our work suggests that while the CipherLeaks attack targets only the VMSA page, a generic ciphertext side-channel attack may exploit the ciphertext leakage from any memory pages, including those for kernel data structures, stacks and heaps. As such, AMD’s existing countermeasures to the CipherLeaks attack, a firmware patch that introduces randomness into the ciphertext of the VMSA page, is clearly insufficient. The root cause of the leakage in AMD SEV’s memory encryption—the use of a stateless yet unauthenticated encryption mode and the unrestricted read accesses to the ciphertext of the encrypted memory—remains unfixed. Given the challenges faced by AMD to eradicate the vulnerability from the hardware design, we propose a set of software countermeasures to the ciphertext side channels, including patches to the OS kernel and cryptographic libraries. We are working closely with AMD to merge these changes into affected open-source projects.

Paper

Background

CipherLeak: ciphertext can be accessed by the hypervisor
In SEV, XEX encryption mode is applied => for a fixed address, same plaintext yields same ciphertext
Controlled side-channel: NPT (nested page table) present bit clear => PF
Before SEV-ES, the registers are saved without encryption
SNP: hypervisor cannot modify or remap guest VM pages (integrity protection)

Attack

Nginx SSL key generation -> 384bit ECDSA key recovery

This exploits constant time swap algorithm. A decision bit encryption pattern is observed, and therefore the nonce could be derived by observing the mask in 384 iterations.

Leaky DNN: Stealing Deep-learning Model Secret with GPU Context-switching Side-channel

November 8, 2022 at 9:58 pm

Abstract

Machine learning has been attracting strong interests in recent years. Numerous companies have invested great efforts and resources to develop customized deep-learning models, which are their key intellectual properties. In this work, we investigate to what extent the secret of deep-learning models can be inferred by attackers. In particular, we focus on the scenario that a model developer and an adversary share the same GPU when training a Deep Neural Network (DNN) model. We exploit the GPU side-channel based on context-switching penalties. This side-channel allows us to extract the fine-grained structural secret of a DNN model, including its layer composition and hyper-parameters. Leveraging this side-channel, we developed an attack prototype named MosConS, which applies LSTM-based inference models to identify the structural secret. Our evaluation of MosConS shows the structural information can be accurately recovered. Therefore, we believe new defense mechanisms should be developed to protect training against the GPU side-channel.

Method

Shared resource: CUPTI, NVIDIA's performance counter
CUDA, kernels, blocks => construct concurrency & GPU contention => A spy
LSTM model training => 5 models: splitting iterations; recognizing long ops; recognizing other ops; inferring hyper-parameters; model correction

Questions

How is GPU shared between VMs? How to avoid different drivers interfereing with each other? (especially both are root).
Will these features(CUPTI) be available on GPUs in data centers?

Towards Formal Verification of State Continuity for Enclave Programs

October 11, 2022 at 10:54 pm

PDF

Trusted Computing Base TCB

September 8, 2022 at 9:48 pm

Blog posts

What’s a Trusted Compute Base?

Early Documents

SPECIFICATION OF A TRUSTED COMPUTING BASE (TCB)

G. H. Nibaldi, 30 November 1979 PDF

A Trusted Computing Base (TCB) is the totality of access control mechanisms for an operating system.

A TCB is a hardware and softwere access control mechunism that establishes v protection environment to control the sharing of information in computer systems. A TCB is an implementation of a reference monitor, as defined in [Anderson 72), that controls when and how data is accessed.

Proof that the TCB will indeed enforce the relevant protection policy can only be provided through a fonrial, methodological approach to TCB design and verification... Because the TCB consists of all the security-related mechanisms, proof of its validity implies the remainder of the system will perform correctly with resWpct to the policy.

Reference Monitor

a TCB is an implementation cf a reference monitor.

complete mediation of access
self-protecting
verifiable

Minimizing the complexity of TCB software is a major factor in raising the confidence level that can be assigned to the protection mechanisms it provides.
...two general design goals to follow after identifying all security relevant operations for inclusion in the TCB are (a) to exclude from the TCB software any operations not strictly security-related so that one can focus attention on those that are, and (b) to make as full use as possible of protection features available in the hardware.

DEPARTMENT OF DEFENSE TRUSTED COMPUTER SYSTEM EVALUATION CRITERIA

DoD 5200.28 STD, l5 Aug 83 PDF

The heart of a trusted computer system is the Trusted Computing Base (TCB) which contains all of the elements of the system responsible for supporting the security policy and supporting the isolation of objects (code and data) on which the protection is based.
... In the interest of understandable and maintainable protection, a TCB should be as simple as possible consistent with the functions it has to perform. Thus, the TCB includes hardware, firmware, and software critical to protection and must be designed and implemented such that system elements excluded from it need not be trusted to maintain protection.

Trusted Computing Base (TCB) - The totality of protection mechanisms within a computer system – including hardware, firmware, and software – the combination of which is responsible for enforcing a security policy. A TCB consists of one or more components that together enforce a unified security policy over a product or system. The ability of a trusted computing base to correctly enforce a security policy depends solely on the mechanisms within the TCB and on the correct input by system administrative personnel of parameters (e.g., a user's clearance) related to the security policy.

Now the concept of TCB is applicable not only in OS but also embedded systems, and focuses on a security-critical portion of the system, including hardware and software.

Some system (Class A1) still requires a formal design specification and verification of TCB to ensure high degrees of assurance.

Authentication in Distributed Systems: Theory and Practice

Paper

Another important concept is the ‘trusted computing base’ or TCB [9], a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security

Some weaknesses of the TCB model

S&P 1997 Paper

Authentication in Distributed Systems: Theory and Practice,

ACM Transactions on Computer Systems, 1992

It’s not quite true that components outside the TCB can fail without affecting security. Rather, the system should be ‘fail-secure’: if an untrusted component fails, the system may deny access it should have granted, but it won’t grant access it should have denied.

An Efficient TCB for a Generic Content Distribution System

2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discover PDF

The trusted computing base (TCB) [1] for a system is a small amount of hardware and/or software that need to be trusted in order to realize the desired assurances. More specifically, the assurances are guaranteed even if all elements outside the TCB misbehave.
The lower the complexity of the elements in the TCB, the lower is the ability to hide malicious/accidental functionality in the TCB components. Consequently, in the design of any security solution it is necessary to lower the complexity of components in the TCB to the extent feasible.

More Recent Study

TCB Minimizing Model of Computation (TMMC)

Bushra, Naila. Mississippi State University ProQuest Dissertations Publishing, 2019. 27664004. Paper

Reducing TCB Complexity for Security-Sensitive Applications: Three Case Studies

EuroSys, 2006 PDF

The security requirements fall into four main categories: confidentiality, integrity, recoverability, and availability. For clarity, we present the definition of these terms.
Confidentiality: Only authorized users (entities, principals, etc.) can access information (data, programs, etc.).
Integrity: Either information is current, correct, and complete, or it is possible to detect that these properties do not hold.
Recoverability: Information that has been damaged can be recovered eventually.
Availability: Data is available when and where an authorized user needs it.

Justifications of Reducing TCB

Relationships between selected software measures and latent bug-density: Guidelines for improving quality

Paper

It seems that nearly all code size/complexity measurements contributes to bug density, except Method Hiding Factor and Polymorphism Factor.

This work just focuses on C++ programs. What about using a different language, e.g., Rust?

Choice of Language also Matters

Rust in the Android platform

Rust modernizes a range of other language aspects, which results in improved correctness of code:

ya0guang's notebook

Disk Encryption in CVMs

Labels and event processes in the asbestos operating system

Abstract

P4CONTROL: Line-Rate Cross-Host Attack Prevention via In-Network Information Flow Control Enabled by Programmable Switches and eBPF

Questions

LM Side-channel

Decentralized Information Flow Control

Theoretical Foundations

Important Papers

Related Papers

Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities

Abstract

Discussion

Software Debloating

Questions

Papers

Is Less Really More? Towards Better Metrics for Measuring Security Improvements Realized Through Software Debloating

A Broad Comparative Evaluation of Software Debloating Tools

Potential Directions

Evaluating Fuzz Testing

Abstract

AddressSanitizer: A fast address sanity checker

Abstract

Goods

Bads

Questions

CC Guest GPU VM Setup

After Reboot

Information Flow Tracking for Heterogeneous Compartmentalized Software

Abstract

Machine Learning with Confidential Computing: A Systematization of Knowledge

SoK: SGX.Fail: How Stuff Gets eXposed

Industry Connecting

Attack in Compositional AI Systems

Research Papers

Attacks

Defense

MISC

Previous Attack

Jailbreak

Proof Checking/Verification for SMT Solvers

Survey

Projects

Verifiers

Proof Formats

Work Based on Proof Checking

PCC: Conventional and Foundational

Conventional

Foundational

Exploiting Unprotected I/O Operations in AMD’s Secure Encrypted Virtualization

System

Attacks

Technical Details

Building GPU TEEs using CPU Secure Enclaves with GEVisor

System

Covered Attacks Surfaces

Question

Honeycomb: Secure and Efficient GPU Executions via Static Validation

SinClave: Hardware-assisted Singletons for TEEs

Attack

Defense

Detection: Logic bugs, inconsistency, typo, etc.

Semantic Inconsistency

Finding Bugs Using Your Own Code: Detecting Functionally-similar yet Inconsistent Code

Pros

Cons

Others

Detecting Misuses of Security APIs: A Systematic Review

Prompt Engineering

RTFM! Automatic Assumption Discovery and Verification Derivation from Library Document for API Misuse Detection

Abstract

System

Strengthes and Weaknesses

When GPT Meets Program Analysis: Towards Intelligent Detection of Smart Contract Logic Vulnerabilities in GPTScan

Questions

Highlights

Importance of Logic Vulnerabilities

Use of Cheaper GPT-3.5

Methodology