Loading lk.img from Mediatek devices into IDA for analysis

Hey! Lately I've been analyzing stage 2 Mediatek bootloader component: the Little Kernel-based bootstrap system. I figured it might be helpful to describe to others the process I've used to load it into IDA.

What we'll be working on

If you want to follow along for practice, you can grab this image. It comes from the latest UFS stock ROM for the Unihertz Titan.

As for IDA: I'm using version 7.6. Have no license? I'm sure you'll be able to find this version online. Search for IDA Pro 7.6 leak.

The process

Initial set-up

Got the file? Great. Load it up into the 32-bit (not ida64.exe!) version of IDA. Scroll up the architecture list until you see this:

Click "OK". IDA will ask you if you want to change the processor architecture to ARM. We're loading an ARM image, so... Yeah, you do want to change that. Click "Yes", and the following window will appear:

We need to extract some of that data from our lk-verified.img for IDA to disassemble the code properly. Namely:

  • ROM Start Address
  • Loading Address
  • File Offset

IDA will figure out (most of) the rest once we find those out.

Getting the necessary load values

Time to open up your favorite hex editor, and load the file in there. I will be using Okteta.

File Offset

All Mediatek partition images have a 512-byte long header. Most of it is dead weight, at least in case of our Little Kernel image. This header can be skipped, and knowing this, our File Offset is 0x200 (512 decimal) - this is where the executable code starts. Easy enough!

ROM Start / Loading Address

Load address is a bit more involved - sometimes it may be specified in the header data. Sometimes it's set to -1 and we have to extract it from the code section itself.

The Easy Way - header contains the address
The load address may be specified at offset 0x002C in any valid LK image.

Unfortunately, we're working with modern Mediatek software and good endings are not allowed, so we'll have to try another way.

The Harder Way - retrieving base address from .start section's data area
Every LK image has a base address loaded from a constant. We can use it as a fail-safe method to get the load address we need. You'll have to look for the hexadecimal pattern 10 FF 2F E1. I'll explain why when we actually get the image to load - it's easier to explain the code when you can see the disassembly. The load address will reside 4 bytes after the end of the pattern.

The code section byte order is little-endian, so we'll have to read the bytes backwards, resulting in 0x56000000. This is our load address. So far, so good!

Consolidating the information so far

After following the above process we now have:

  • The file offset [0x200 (512 dec)], i.e. the place in the file where executable code starts.
  • The load address [0x56000000], i.e. the place in device memory where the LK is located at runtime.

Loading the file and triggering the IDA auto-analysis

Now that we have our data we will plug it into the dialog from before. Take a look at the following picture:

Now we can finally click OK, and IDA will load the file for us, creating the proper ROM segment and navigate us to the view we'll be spending most of the time time staring at in the next entries in this series (how long it'll be until I write them? fuck knows lmao).

But hey, where's the code disassembly? V? You told me the code starts at file offset 0x200????

Yes. The code starts at file offset 0x200. But IDA has no idea it's the entry point. In IDA-View A click the line saying DCB 7 at ROM:56000000. Once you have the cursor there, press [C] on your keyboard (or right-click and select Code from the context menu). This will tell IDA it's where executable code starts and to use this as a starting point for auto-analysis. The result is immediately visible:

You can start exploring and modifying the code now, happy hacking! 🩷

Appendix: Why the pattern 10 FF 2F E1?

Remember that pattern I told you to look for with the explanation being just trust me, chief? Well, now that we can see the code properly I can tell you what's going on. I won't be going into details as to what and why the initialization code is doing - it's out of scope. However, this picture should give you a good idea as to why we're looking for that exact hex sequence.

See you in the next unhinged rambling!

vdd - January 21, 2024 • titan