

## Exploring Dual Edges of SRAM Data Remanence in SoCs:

#### Covert Storage and Exfiltration Risks in TEE

Jubayer Mahmod

#### About me



#### My expertise

Hardware-Oriented System Security Cloud FPGA Security Fake Chip Detection and Anti-Counterfeit Framework Design

#### Senior Engineer @Lucid Motors' RedTeam



Hardware Security PhD @Virginia Tech Advised by Dr. Matthew Hicks







@jubayer0175

#### Disclaimer

The content of this presentation is based on my doctoral research conducted at Virginia Tech. All information shared here is publicly available from various publication venues.

It does not contain any proprietary technology and does not reflect the opinions or positions of Lucid.

## Volatile memory does not forget data instantly

Data remanence: when a memory device retains information past when it is <u>assumed to no longer exist</u>



# Static Random Access Memory (SRAM)



SRAM startup state: digital window into the analog world



#### Why Steganography?





Agent 007



# Steganography is information hiding technique

Hide information in "plain sight" to allow plausible deniability of its existence.

Typical steganography media





# Threat model





#### SRAM cell and its power on-state



- Designed to be **balanced**.
- At startup, one of the inverters wins the race condition.

In this case, by winning I mean relatively faster rise time of an inverter's pull up network.

# Aging burns in data in SRAM cell



Like negative in photography, payload gets hidden as complement

# Accelerating aging condition

Aging takes **decades** to impact performance.



- Aging SRAM with all 1s in it, reduces number of 1s in subsequent power on
- Aging effect is logarithmic, over time rate of change decreases



## Data encoding process







#### InvisibleBits evaluation

Retrieval error?

Plausibly deniable?

#### Errors without any ECC

|                                                  | Debug host     |                                                           |           |               |             |               |  |
|--------------------------------------------------|----------------|-----------------------------------------------------------|-----------|---------------|-------------|---------------|--|
| MICROCHIP<br>ATSAML11E16A                        |                | Controller<br>Debugger<br>Power source<br>Thermal chamber |           |               |             |               |  |
| 2072AD25W64                                      |                |                                                           | (a)       |               | (b)         |               |  |
| AEVD                                             | Device         | SRAM usage                                                | Vacc.     | Tacc.         | Accuracy    | Encoding time |  |
|                                                  | ATSAML11E16A   | Main memory                                               | 4.8V      | 85° <i>C</i>  | 97.2%       | 16 hours      |  |
|                                                  | MSP432P401     | Main memory                                               | 3.3V      | $85^{\circ}C$ | 93.5%       | 10 hours      |  |
|                                                  | LPC55S69JBD100 | Main memory                                               | 5.5V      | $85^{\circ}C$ | 88.5%       | 24 hours      |  |
| ALS !!                                           | BCM2837        | Cache                                                     | 2.2V      | $85^{\circ}C$ | 79.2%       | 120 hours     |  |
| LPC55S6x<br>mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm | *Accelera      | ation voltages are                                        | - derived | from exr      | periments & | datasheets    |  |
|                                                  |                | ation voltages are                                        |           | i nom ext     | ciments &   |               |  |

#### Improving accuracy



#### Plausible deniability



#### Full system implementation



Other evaluation performed: 1) Source of error 2) Recovery 3) multi-snapshot adversary



Jubayer Mahmod & Matthew Hicks, "Untrustzone: Systematic Accelerated Aging to Expose On-chip Secrets," in IEEE Security & Privacy'24.

\*we made this paper public only after ARM released an architecture security advisory

#### Takeaways InvisibleBits

SRAM power bus is accessible from outside of the SoC



Stored data directs future power-on state

# Security threats are on the rise: hardware is in the spotlight

As software security enhances, attackers shift their focus to exploiting lower-level system components.



"Ultimately, hardware is the foundation for digital trust. A compromised physical component can undermine all additional layers of a system's cybersecurity to devastating effect. Hardware security, therefore, focuses on protecting systems against the vulnerabilities at the physical layer of devices"

FORUM

# Security perimeter reduction helps preventing many physical attacks

Keeping sensitive plaintext in on-chip SRAM reduces risks off-chip physical attacks (e.g. cold boot)

Enforcing secure execution prevents illegitimate access to secure memory area

Security attribute change = Memory erasure



## TrustZone fundamentals

Divides a system into the Secure World & Normal World

Non-secure state cannot access secure memory area

Software **bug** in non-secure state cannot access secure memory



Cache lines are **physically shared** between the Worlds

NS tag bit indicates security levels of a cache line



TrustZone controls security attributes, but physical memory is shared between the *Worlds*.





#### Overarching threat model

Secrets on-chip SRAM guarded by TrustZone

Attackers have physical access

Target-information- and SoC-specific threat models





**\$** Exfiltrate secrets from cache

#### Technical challenges

Overdrive SRAM's power bus

Capture SRAM's power-on state using software interface

Reduce contamination of SRAM power-on state



#### Test Platforms and SoCs





| System-on-Chip        | Core                     | SRAM size          | TrustZone    | Access to uncontaminated<br>power-on state | Aging<br>acceleration | Manufacturer               |
|-----------------------|--------------------------|--------------------|--------------|--------------------------------------------|-----------------------|----------------------------|
| ATSAML11E16A [59]     | ARM Cortex-M23           | 16KB               | 1            | $\checkmark$                               | ✓                     | Microchip                  |
| LPC55S69JBD100 [62]   | Dual-core ARM Cortex-M33 | 320KB              | $\checkmark$ | 1                                          | $\checkmark$          | NXP                        |
| M263KIAAE [21]        | ARM Cortex-M23           | 96KB               | $\checkmark$ | $\checkmark$                               | $\checkmark$          | Nuvoton                    |
| M2351SFSIAAP [19]     | ARM Cortex-M23           | 96KB               | $\checkmark$ | $\checkmark$                               | $\checkmark$          | Nuvoton                    |
| M252KG6AE [20]        | ARM Cortex-M23           | 32KB               | $\checkmark$ | $\checkmark$                               | $\checkmark$          | Nuvoton                    |
| M251SD2AE [20]        | ARM Cortex-M23           | 12KB               | 1            | $\checkmark$                               | 1                     | Nuvoton                    |
| STM32L562 [85]        | ARM Cortex-M33           | 40KB               | 1            | $\checkmark$                               | ✓                     | STMicroelectronics         |
| BCM2837 (RPi3) [69]   | Quad-core ARM Cortex-A53 | L1:128KB, L2:512KB | $\checkmark$ | $\checkmark$                               | $\checkmark$          | Broadcom                   |
| BCM2711 (RPi4) [70]   | Quad-core ARM Cortex-A72 | L1:320KB, L2:1MB   | 1            | $\checkmark$                               | $\checkmark$          | Broadcom                   |
| R7FS1JA783A01CFM [25] | ARM Cortex-M23           | 32KB               | X            | $\checkmark$                               | ✓                     | <b>Renesas Electronics</b> |
| MSP432P401 [35]       | ARM Cortex-M4            | 64KB               | X            | $\checkmark$                               | $\checkmark$          | Texas Instruments          |
| MSP430G2553 [36]      | MSP430 single cycle      | 0.5KB              | ×            | $\checkmark$                               | 1                     | Texas Instruments          |
| EFM32WG990F256 [82]   | ARM Cortex-M4            | 32KB               | ×            | $\checkmark$                               | 1                     | Silicon Labs               |

#### Exfiltrate an AES key from TrustZone



#### **Power-on state** Stored data **Interpreted data** % of bits **Transition type** Correctness **Pre-stress Post-stress** 2.19% Flipping failure 0 0 0 0 30.31% Flipping success 23.11% Reinforcing 0 26.41% Reinforcing 0 17.38% Flipping success 0 0.61% Flipping failure 0 **AES Key** ....6c6c2068696d2077656c746869736973617365637265746b65794.... ..6c6c246a696d2077656c746869736973677365637265746b65799... 0x20000800 Key extraction scenario #1 Error rate: 2.8% • Key search space $\approx 2^{23}$ Secure Key extraction scenario #2 Error rate: 1.27% • 0x20002000 Key search space $\approx 2^{13}$ Non-secure 0x20003FFF 33 -64 bytes-64 bytes

**Pre-stress SRAM snapshot** 

**Retrived information** 

#### Exfiltrate an AES key from TrustZone

#### Exfiltrate proprietary firmware



"Case: Cache-assisted Secure Execution on ARM Processors" Oakland'16

# Exfiltrate proprietary firmware

|                       | LPC1   | LPC2   | LPC3   | Combined | - |
|-----------------------|--------|--------|--------|----------|---|
| Scenario # 1 accuracy | 87.70% | 86.70% | 88.50% | 95.82%   | - |
| Scenario # 2 accuracy | 93.20% | 91.76% | 93.36% | 98.29%   |   |



Visual demonstration of firmware burn-in



Secret placement influences accuracy

#### Exfiltrate secrets from cache

#### Victim software executes from CPU

# Accelerated aging burns in cache lines in the analog domain





Elevated voltage

Stress time

Post-stress data extraction

Heat

Introduces a 'fake kernel'

Stops cores from enabling caches (disabled MMU)

Upon request dumps cache lines into the system RAM (using co-processor interface & ram Indexing)



Assumes secret data (attack #1) and proprietary software (attack #2) **are in the on-chip cache** (attack #3)

The AES key extraction accuracy **reaches 93.2% after** 120 hours of aging (2.025× nominal voltage and  $T = 85^{\circ}$ C)

## Q&A







#### Backup slides

#### Message extraction error: source (1)



- Thermal
- Long time

#### Performance comparison

Flash **program-time-based** scheme achieves **0.05%** capacity (256KB Flash carries 131B)[Oakland'15]

Flash **program-voltage-based** scheme improves capacity by **2x** [Usenix Fast'18]

Invisible bits (with 5 copies @<3% error) carries 12.8KB (100x)

|                            | Ubiquity | Capacity           | Resilience | Read stable           | _      |
|----------------------------|----------|--------------------|------------|-----------------------|--------|
| Flash Program-time-based   | e        |                    |            | 0                     |        |
| Flash program-voltage-base | ed 🖰     |                    | •          | 0                     |        |
| Invisible bits             | 0        | 0                  | 0          | 0                     |        |
| • = Excellent, •           | = Very g | good, $\bigcirc$ = | Good, 🔿 =  | Fair, and $\bullet =$ | · Poor |

#### Message extraction error: source (2)



#### Performance comparison

Flash **program-time-based** scheme achieves **0.05%** capacity (256KB Flash carries 131B)[Oakland'15]

Flash **program-voltage-based** scheme improves capacity by **2x** [Usenix Fast'18]

Invisible bits (with 5 copies @<3% error) carries 12.8KB (100x)

|                            | Ubiquity | Capacity           | Resilience | Read stable           |         |
|----------------------------|----------|--------------------|------------|-----------------------|---------|
| Flash Program-time-based   | e        |                    |            | 0                     |         |
| Flash program-voltage-base | d        |                    | -          | 0                     |         |
| Invisible bits             | 0        | 0                  | 0          | 0                     |         |
| • = Excellent, •           | = Very g | good, $\bigcirc$ = | Good, 🔿 =  | Fair, and $\bullet$ = | = Poor. |

#### Takeaways A new data hiding technique

Covert: Information stays in the hardware layer

Erase/write tolerant: Digitally indestructible

Ubiquitous: Can be implemented in almost any device



High capacity: 100x compared to state-of-the-art

#### Qualitative exploration of defensive landscape

#### Initializing the SRAM at startup

- Needs to wipe out the SRAM at startup
- Slows down boot speed
- Eliminates useful application of SRAM power-on state

Scrambling SRAM data at runtime

- Complement data at runtime to reduce burn in effect ( $0xAA \rightarrow 0x55$ )
- Core freezing will prevent software mitigation

#### Preventing aging acceleration

- Prevent over voltage
- Bypassing excess energy before reaching the core