THE LAB #107: Reversing Shopee's native crypto with Ghidra
Shopee hides its crypto in a native library. We read it in Ghidra and rebuild it in Python, byte for byte.
Shopee is one of the largest marketplaces in Southeast Asia, and like most big apps, its mobile API is a better scraping target than its website. The app talks to the backend in plain JSON over HTTPS, the endpoints are stable, and the anti-bot layer is usually lighter than the one guarding the web frontend. We covered the easy version of this in The Lab #12, where Charles and JADX were enough to read an Android app’s traffic and replay it.
Before proceeding, let me thank NetNut, the platinum partner of the month. Their set of solutions cover all your needs for scraping.
Shopee does not hand you the easy version. Capture a request, replay it, and the backend answers with HTTP 418 and a security error code. Every API call carries a set of anti-fraud headers, and the code that builds them is not in the Java you can read with JADX. It sits in native code, in compiled .so libraries, which is exactly where traffic interception and a Java decompiler stop being useful. Open the app in JADX and the signing method is there in name only, declared native, with its body on the far side in ARM machine code.
This is a two-part investigation into how Shopee signs its API requests and how you reproduce that signing yourself. The strategy is the one that works on most hardened apps. You locate the native security libraries, open them in a disassembler, and turn what they do back into something you control. When a library is readable crypto, you reimplement it in Python and sign offline, at any volume, with no app in the loop. When it is a bytecode virtual machine you cannot practically rewrite, you keep the app running and drive its own signer as an oracle. We chose this route because it is the one that scales. An offline signer, or an oracle you call, runs inside your scraper on a server. A rooted phone you have to babysit does not.
This first part is the foundation. We take one of Shopee’s native libraries, libshopeeaegis.so, reverse it end to end with Ghidra and rebuild it in Python. Reading the decompiled code identifies every operation as textbook crypto, and our rebuild reproduces those algorithms byte for byte. It is the readable case, the kind you win cleanly, and the clearest worked example of the method. The second part takes on the harder library, the one that computes the per-request signature, and gets past it with the oracle approach.
What you take from each part depends on your goal. If Shopee is your target, the payoff is the full picture of its request signing across both parts. If you scrape other apps, the method matters more than the marketplace. Most apps that protect their API at all push the work into a native library, and a large share of those are plain, readable crypto you can reproduce. We work it out on Aegis here, and it is the same move on the next app you open.
For your scraping needs, having a reliable proxy provider like Decodo on your side improves the chances of success.
The tools
We use four tools, each doing one job.
androguard is a Python library for static APK analysis. We use it for fast recon. It lists the native libraries an app ships and finds which classes declare native methods. It does not give you readable source. It gives you structure you can script.
JADX decompiles Dalvik bytecode back to Java. It is how you read the managed side of the app and find the exact class and method that crosses into native code. JADX stops at the native keyword, which is the handoff point to the next tool.
Ghidra is the NSA’s open source reverse engineering framework. It disassembles a .so and decompiles it to pseudo C. It is the only tool here that can read native code, and it is the one this article leans on.
Frida injects a JavaScript engine into a running process so you can hook and call functions live. We use it to run the app under instrumentation and confirm our static reading against what the app actually does.
JADX and androguard read the managed code. Ghidra reads the native code. Frida watches the code run. The native library is the one piece only Ghidra can open, so the work centers there.
Modeling the app’s defenses
Before opening anything, it helps to name the layers, because Shopee has several and only one is our target here.
The managed layer is the Java and Kotlin code. It builds requests, attaches headers, and calls into native methods. JADX reads it.
The native layer is a set of .so libraries the app loads. Pull the arm64 split out of the APK with androguard and the security-relevant ones stand out by name. They are libshopeeaegis.so, libshpssdk.so, and libBkeBizSecurity.so, plus libjnihook.so and libshook.so. The last two are a hooking framework and an anti-hook layer, which means the app actively watches for instrumentation. That matters for Frida later.
The request-signing layer sits on top. Two okhttp interceptors, com.shopee.app.network.antifraud.b and .d (they call themselves SecurityNewSapInterceptor and SecurityNewSapPostInterceptor), attach the anti-fraud headers af-ac-enc-sz-token and x-sap-ri to API requests. The values they attach come from libshpssdk.so, the Shopee Security SDK.
We target one layer, libshopeeaegis.so, a general-purpose crypto library the app calls for specific operations. The request signer in libshpssdk.so stays out of scope. It is a bytecode virtual machine, a harder problem that we handle separately, and reproducing the af-ac headers is not the promise here. The promise is that you can take libshopeeaegis.so, understand every operation it performs, and reproduce it byte for byte in Python.
One detail decides whether that promise holds. libshopeeaegis.so loads only when the app needs it, so it is not present at idle. We watched the process maps over a minute of normal browsing and the library never appeared. The crypto we are about to reverse is a toolbox the app reaches for in certain flows, not the thing running on every request.
Start your scraping journey with Byteful: 10GB New Customer Trial | Use TWSC for 15% OFF | $1.75/GB Residential Data | ISP Proxies in 15+ Countries
Getting the library and finding the door
We pulled Shopee PH 3.75.24 (com.shopee.ph) as an XAPK and unzipped it. The native libraries are not in base.apk. For a split bundle they live in config.arm64_v8a.apk. Listing the .so files with androguard and unzip -l, libshopeeaegis.so is a small one at 280 KB, which is a good sign. Small means little room for a heavy obfuscator.
androguard answers the first question, which library to open. It does not answer the second, how the app calls it. For that we go to JADX and find the class on the Java side. The library registers its native methods against com.shopee.sz.reinforce.Aegis. The class exposes a method fire, overloaded, declared native. Two of the overloads matter:
native byte[] fire(int mode, byte[] data)
native byte[] fire(int mode, byte[] data, byte[] key)This is the door. The first argument is an integer mode. Then one or two byte arrays. The return is a byte array. JADX cannot show what fire does, because the body is in the .so. So we open the .so.
Reading the library with Ghidra
We ran Ghidra 12.1.2 in headless mode. It runs without a GUI, it scripts cleanly, and it repeats exactly. The workflow is documented in our Ghidra tool skill you can use in Claude Code, just like I did for this test. In short, you import the .so, let auto analysis run, then run a script that decompiles functions to a file.
support/analyzeHeadless /tmp/proj aegis \
-import config.arm64_v8a/lib/arm64-v8a/libshopeeaegis.so \
-scriptPath ./scripts \
-postScript DecompileExport.java out.c \
-overwriteAuto analysis finished in nine seconds and the decompiler produced 674 functions with zero failures. That number alone tells you this is not a packed or virtualized binary. A protected library fights the decompiler; this one did not.
The first useful function is JNI_OnLoad, which every JNI library runs at load time. Read its pseudo C and it looks up the class com/shopee/sz/reinforce/Aegis and calls RegisterNatives with two methods. That confirms the door from the Java side and tells us the native functions are registered dynamically rather than exported under Java_* names. Dynamic registration is a mild form of hiding, and it is exactly what Ghidra’s JNI handling and a RegisterNatives trace are for.
The C++ symbols survived. That is the break that makes this library readable. The class is Aegis, with methods named missileFire, missileCount, prism, snowDon, tugWar, and parse. There is a second class, TeslaModel, with model_3, model_a, model_b, model_c, model_e, model_s, model_x, model_y, and getNuremberg. The names are deliberately silly, a Tesla and military theme, but they are real symbols, and the structure is intact.
Follow the call chain from the registered native function. The dispatcher is Aegis::prism, a plain switch on the mode integer:
switch(param_1) {
case 0: model_3(...) // one input
case 1: model_x(key, input, ...) // keyed
case 2: model_x(...); model_3(...) // keyed, then case 0
case 3: model_y(...); model_3(...)
case 4: model_e(...)
case 5: model_a(...)
case 6: model_b(...)
case 7: model_s(...); model_3(...)
case 8: model_c(...); model_3(...)
}One native call selects one of nine operations by an integer, and some operations are a keyed primitive followed by model_3. To name each operation we read two things, the output size and the primitive body.
The output size comes from TeslaModel::getNuremberg(mode, len), which missileFire calls to size the output buffer before doing the work. It returns 16 for mode 4, 32 for mode 5, 64 for mode 6, and 20 for mode 8. Those are the digest sizes of MD5, SHA-256, SHA-512, and SHA-1. For mode 0 it returns the Base64 expansion of the input length. The size function alone half-names the table.
The bodies confirm the rest, and here the silly names get helpful, because the renamed primitives kept their original suffixes. model3_autopilot is a textbook Base64 encoder. It reads three bytes, writes four, and pads with 0x3d, which is the = character. modelx_autopilot_cbc is AES in CBC mode, recognizable because it XORs each 16 byte block with the previous ciphertext block before the round function. The hash contexts are renamed with a phantom and F theme but keep the gnulib _init_ctx / _process_bytes / _finish_ctx shape. phantom1 is SHA-1, phantom256 is SHA-256, InitF22 is SHA-512. And phantom1 as called by model_c is the HMAC form. It XORs the key with 0x36 for the inner pad and 0x5c for the outer pad over a 64 byte block, which is the HMAC construction.
Two of the keyed modes turned out not to be ciphers at all. model_s calls phantom256 with a key and a message and returns 32 bytes, so it is HMAC-SHA256. model_c calls phantom1 the same way and returns 20 bytes, so it is HMAC-SHA1. Reading the bodies kept us honest here. From the signatures alone we had guessed AES.
That gives the full table.
model_x pads with PKCS7 to a 16 byte boundary, then runs AES-CBC. The key length sets the variant, and a 16 byte key gives AES-128. model_y does the same but writes the IV in front of the ciphertext, the standard prepend-the-IV pattern, before the Base64 in mode 3.
One value is not in the file. The CBC IV is a fixed 16 byte constant the library keeps at a .bss address. .bss is zero-initialized on disk and filled at runtime, so the IV is set when the library initializes and you cannot read it statically. For the hash, HMAC, and Base64 modes that does not matter, because their output is fully determined by the input and key. For the three AES modes it means byte-identical output needs the real IV, which you read from the live process once the library loads.
As always, the code that will be used for the python reimplementation we’re showing now can be found in our GitHub repository reserved for paying users, inside the folder 107.SHOPEE-GHIDRA.






