Learning by Breaking - A LayerZero Case Study - Part 2

Or C.
Mar 1, 2024
7 min read

Updated: Mar 2, 2024

In part one we introduced LayerZero, its anatomy and key components, and showed the critical flaw of ULNv1. Be sure to read it first as we'll assume it's well understood. Today we'll discuss Stargate, the liquidity layer built atop of LayerZero, plus dig into two high-severity DoS we've identified in it.

Intro

Stargate was the first dApp launched for LZ and at the time of writing stores $277M in value across 15 chains in its liquidity pools. It's developed and maintained by the LZ team. It offers a private $15M bug bounty, which is being migrated to Immunefi.

Anatomy

In Stargate, LPs and swap users make all interactions through the Router contract. The funds are locked in Pool instances, one per token. Each pool is combined with one or more remote pools to make "chain paths". As a liquidity layer, the paths are between same-value assets, i.e. USDC on BASE <> USDC on mainnet. To incentivize high liquidity / balance between chains SG imposes an "equilibrium fee".

The main ways to interact with Stargate are:

addLiquidity() - deposit tokens for LP tokens
redeemRemote() - burn local LP tokens for tokens on remote chain
redeemLocal() - burn local LP tokens for tokens on local chain
instantRedeemLocal() - burn local LP tokens for tokens on local chain, without invoking the bridge
swap() - send tokens on local chain for tokens on remote chain

The color coding represents the number of cross-chain messages necessary for each function. Pinks are instant, oranges require one message, while yellow requires a rebound message to the local chain.

The implementation pattern is:

User calls router function
- Router invokes relevant pool function to update state
- Router invokes appropriate Bridge function to encode and send a LZ message to the remote Bridge
Remote bridge receives message
- decodes message and calls appropriate Router functions
- Router calls Pool functions to update state / release tokens
If a rebound message is necessary (yellow), the details are stored in a Router mapping and another external TX performs the call to bridge back a message.

With the basic knowledge out of the way, it's time to discuss some bugs.

Bug #1 - The Solidity quirk

As said, when swap() or redeemRemote() is executed, the local Bridge sends a message to the remote Bridge. The Bridge lzReceive() executes the code below:

The Router runs _swapRemote():

An important feature of Stargate is to allow the recipient of the swap to execute arbitrary logic. They pass a bytes payload during swap(), and it will be sent to the sgReceive() entry point, as seen above. The swapper pays for dstGasForCall, the amount of gas that's passed for recipient execution.

Note that it is mission critical that execution of a swap doesn't revert at the Bridge level - if it does, no other payloads in the src Bridge <> dest Bridge channel can be delivered until the current payload is resolved ( the ordering guarantees discussed in part 1). That's why LZ is careful to catch any exception thrown from the user's sgReceive(), stores the error in a local cache, and returns gracefully. This is an example of the general approach LZ recommends for all integrators.

There's one special case though that wasn't considered - In Solidity, if the callee in a try/catch statement is not a contract, the statement reverts without going to the catch clause. For some reason that detail isn't mentioned in the Solidity docs. This introduces a very simple DoS vector:

Attacker swaps a tiny amount using every Bridge<>Bridge pair
They pass a 1-byte payload to a non-contract, e.g. 0x11111...111
Delivery will revert the Bridge

When the Bridge reverts, the message is stored at the Endpoint for re-delivery, and blocks future messages in the channel. But since the code path will always lead to a revert, future re-deliveries will fail, permanently locking the channel.

There is a final escape-hatch the devs could use to unblock the freeze - The LZ Endpoint has the following function:

In other words, the Bridge owner can remove a payload destined for the bridge. It's a manual call that's possible if ownership is not renounced. Still, since the DoS vector is so cheap and practical, it can be done for an extended period, and a sophisticated actor can backrun a forceResumeReceive() call with another revert, maintaining the freeze.

Disclosure

After discovering the issue we immediately reported it to the team, on 09/06/23. Much to our surprise, the team said this was well-known to them and is handled "in the validation library". We couldn't find it anywhere in the Stargate repo, so we ended up checking the LayerZero repo and it turns out the fix is found at the MPTValidator contract!

To make it clear - this validation is done on every single message bridged via LayerZero! If the destination is the Stargate bridge, it is a swap call, there is a payload and the target is a contract, then the payload is zero-ed out (the securePayload ). This solves the issue because when payload is zero, the Bridge never calls sgReceive().

The issue was fixed around a year before we re-discovered it. The team said it was discovered by them. We found it extremely surprising that it was fixed at a completely different layer, and shows how coupled Stargate and LayerZero really are.

Bug #2 - Same same, but different

After taking the L with the previous bug, we looked for alternative ways to DoS the bridge. Callbacks are very often weak spots, and doubly so for LayerZero applications - issues with them could easily translate to blockage of the entire channel as seen above. Therefore we were determined to look again at the sgReceive() snippet.

It seems there's no additional ways to directly revert the try/catch. At times like these we turn our attention to gas-related attacks. For any Stargate request, the source Bridge charges bridging fees to the user. For swaps, user pays for a fixed 175000 plus any dstGasForCall for their callback. If we can cause delivery to revert when delivered with over the paid gas amount, it represents a DoS vector:

Either the Relayer passes unpaid for gas to the Bridge
Or execution reverts, freezing the bridge

In practice, the 175000 seemed to include large buffers so it looked difficult to surpass it. We tried a variation of the returndata-bomb attack, which is when the external call returns a large bytes blob to consume the caller's gas. Since the try statement doesn't catch the return value, returndata is useless there. However, we can abuse the catch statement and revert with a large message, which will be copied to the bytes memory reason. Unfortunately we weren't able to waste enough gas for the attack to be practical.

However, the idea did lead us to an interesting discovery:

The catch statement copies the entire payload into storage! SSTOREs are a different ballgame - any zero to non-zero SSTORE costs 22.1k gas. With payload hard-capped at 10k bytes at the LZ level, and each SSTORE storing 32 bytes, it comes to 313 operations, totaling almost 7,000,000 gas! There's also the LOG opcode at emit CachedSwapSaved(), which logs the entire content of payload, spending ~80k gas.

The exploit is again very simple:

Swap and target to a contract address a payload which costs over 175k to store (much fewer than 10000 bytes are necessary). Repeat for each bridging channel.
The malicious contract will spend incoming gas and revert.
The Bridge will revert due to OOG. The payload would be stored at the Endpoint and block further bridging. Relayer could unfreeze by passing 7M+ gas.

Disclosure

The report was submitted on 03/07/23, and we received a response 48 hours later. Stargate confirmed the issue, but said they already discovered it internally and sent a fix to their auditors one week prior. We requested a Proof of Prior Knowledge, which we received from the auditors. We've offered to help review the fix, which we did once they issued it in October (covered in the next section). We've requested public disclosure and consideration for a good will bounty for our efforts (similarly to samczsun), but to date none were offered. We respect that decision, but do think the overall ultra-secretive approach to security LayerZero employs is counterproductive in the long-term. It's been 8 months and there's not been the tiniest of public acknowledgements around this DoS issue.

The Fix

Similarly to bug #1, Stargate didn't redeploy the Router. They introduced another layer of abstraction, enter StargateComposer. Now users are not allowed to provide a payload for swaps directly to the Router - they must call StargateComposer's swap(), which wraps their payload and re-targets it to the StargateComposer on the receiving chain. The destination composer will safely unpack the original destination and call it.

We can see that the composer (Etherscan link) is much more careful around gas spending, protecting against many edge cases. It reserves extra gas in the caller to handle the catch clause, it stores the payload hash instead of the entire payload, and doesn't log it in an Event.

An interesting question comes up - how does Stargate validate users are playing ball and calling the Router only through the Composer? None of the code was redeployed after all. It turns out they do it through a new Relayer. It's extremely odd as Relayers aren't delegated any security roles, those are done in the ULN + Validation modules. We've tried calling the Router directly in a mainnet fork and hit the following revert:

It occurs in the call stack below:

The Relayer implementation which is reverting isn't verified on Etherscan at this time ( will leave to the reader to decide if that's reasonable for a $300M TVL protocol). Supposedly, it checks if the payload destination is the Composer at the destination chain. In other words, Stargate found a way to hook the swap() call without redeploying the Router.

Bug theory

The bugs covered in part 2 are an example of a good offensive approach:

From one side, we focused on bug heuristics - code that is very easy to get wrong. Here those were: callbacks, gas-sensitive code, try/catch
From the other side, we came with an idea of the set of impacts we're looking for. For LZ applications, the concept of blocking the queue is a highly attractive target.

From a defensive mindset, a comprehensive invariant-testing suite should be able to detect both issues, since they aren't deep business-logic bugs.

Thanks for joining us for part 2. We've gone from re-discovering a 1-year-old issue to re-discovering a 1-week-old issue. In part 3, we'll finally have more luck and be the first to find a high-severity freezing bug in LayerZero.

Trust
security

Learning by Breaking - A LayerZero Case Study - Part 2

Intro

Anatomy

Bug #1 - The Solidity quirk

Disclosure

Bug #2 - Same same, but different

Disclosure

The Fix

Bug theory

Recent Posts

Comentários