A Comparison of Tools for Binary Diffing

Binary diffing is a useful method for debugging issues in new builds, reverse engineering exploits from security patches, and updating tools to work with modified binary layouts. This article will compare:

For our test we will compare two DLLs while considering:

-rw-r--r-- 1 user user  28M Oct 18 18:22 big-A.dll
-rw-r--r-- 1 user user  18M Oct 18 18:22 big-B.dll
-rw-r--r-- 1 user user 7.2M Oct 18 16:27 small-A.dll
-rw-r--r-- 1 user user 7.2M Oct 18 16:27 small-B.dll
  1. Ease of installation
  2. Ease of use (includes overhead)
  3. Accuracy of detecting differences
  4. Extensibility

Intro to Diffing

Textual diffing is a common practice when reviewing patches in source-code.

        l2cap_add_conf_opt(&ptr, L2CAP_CONF_RFC,
                           sizeof(rfc), (unsigned long) &rfc, endptr - ptr);

-       if (test_bit(FLAG_EFS_ENABLE, &chan->flags)) {
+       if (remote_efs &&
            test_bit(FLAG_EFS_ENABLE, &chan->flags)) {
            chan->remote_id = efs.id;
            chan->remote_stype = efs.stype;
            chan->remote_msdu = le16_to_cpu(efs.msdu);

or when you are adding code to a Git repository. How does this work though? Textual diffing revolves around finding the longest common subsequence between two texts.

A B C D E F
A   C D E  G

We can detect “edits” by identifying which strings have to be inserted or deleted to arrive at the new string. If we want to improve the output for code then we can operate on lexemes or entire lines instead of individual characters. Naturally, this increases complexity because the algorithm needs the ability to parse the input text as the target language.

Intro to Binary Diffing

Binary diffing is more complicated because we have to determine what a “difference” means.

Depending on your use-case, some of these perspectives might have no value or be invaluable.

To help illustrate this, consider the snippet below.

fn main() {
    let image = include_bytes!("cat.jpg");
}

Do we care if the image is changed or care how the image is changed?

If we want semantically aware diffs then we need dedicated parsers for each file format (e.g. ELF, PE32). Furthermore, we might need additional parsers for any data inside the files (e.g. x64, ARMv8, Unicode, ASN.1, WAV, etc.).

Assembly

mov eax, 1337
- sub eax, 1337
+ add eax, 1337

Bytes

b8 39 05 00 00
- 2d 39
+ 05 39
05 00 00

Pseudo-Code

- RegWrite(eax, 0)
+ RegWrite(eax, 2674)

Parsers capable of answering format specific questions introduce a lot of complexity and overhead so it is important to know the capabilities of your chosen diffing tool.

Diaphora

Diaphora is an open-source diffing tool boasting an impressive feature-set. The features I’m interested in are as follows:

If you are interested in the Diaphora heuristics, they are described in the docs/ folder of the GitHub repository.

Ease of installation

Diaphora is very straightforward to install.

$ git clone https://github.com/joxeankoret/diaphora.git

Then you can point IDA Pro to the diaphora.py script in the Command File prompt.

Ease of use

Diaphora serializes diffing information into a Sqlite database. Initially, we export a Sqlite image for the first target using IDA Pro. Then we load the second target into IDA. This time, we can select to diff the second target against the Sqlite database of the first target.

If you’re interested, the database schema is available here.

create table if not exists functions (
                          id integer primary key,
                          name varchar(255),
                          address text unique,
                          nodes integer,
                          edges integer,
                          indegree integer,
                          outdegree integer,
                          size integer,
                          instructions integer,
                          mnemonics text,
                          names text,
                          prototype text,
                          cyclomatic_complexity integer,
                          ...

You can configure some of the heuristics and information used by and stored in the database. This allows you to reduce the analysis time and database size for large targets. In my case, I disabled slow heuristics then disabled exporting instructions.

Diaphora Options

Attempting to export the database took a long time… a long time. The time to export big-A.dll and big-B.dll took over 50 minutes, each. The final file of the sqlite databases were ~130MB each.

Accuracy of detecting differences

Once the databases were made, Diaphora attempted to diff them. However, the diffing process took over an hour then crashed before results were produced. I think Diaphora consumes a huge amount of memory which causes it to be OOM killed. Perhaps Diaphora would benefit from streaming or being written in a more efficient language (e.g. C or a language with C-extension interop).

To actually produce results, I have used Diaphora on small-A.dll and small-B.dll. It took over 20 minutes, each, to export both images but only 21 seconds to diff. It should be noted that Diaphora crashed during the export process but recovered itself. The exports are absolutely huge.

-rw-r--r-- 1 user user 441M Oct 18 16:49 small-A.dll.sqlite
-rw-r--r-- 1 user user 439M Oct 18 16:48 small-B.dll.sqlite

Diaphora Match

The UI for inspecting differences is very pleasant for Diaphora. It is easy to compare differences between images. In addition, the matches seem to be very reliable which is nice. You can easily see which instructions have been changed between the two snippets. This patch mode also exists for the IDA pseudo-code view (these .NET DLLs don’t have pseudocode).

Extensibility

Diaphora is written in pure Python so it can be easily modified to add or remove heuristics. You can add a new heuristic here.

HEURISTICS.append({
    "name": "Coin Toss",
    "category": "Partial",
    "ratio": HEUR_TYPE_RATIO,
    "sql": """
        SELECT abs(random()) / 9223372036854775807.0
        %POSTFIX%
    """,
    "flags": [HEUR_FLAG_SAME_CPU]
})

However, the heuristic is easy to add if it can utilise the sqlite3 export. It is possible to add new fields to the database but will require some awkward restructing of code.

BinDiff

BinDiff is a renowned diffing tool with the following capabilities:

If you are interested in the BinDiff heuristics, they are described here.

Ease of installation

Installation is also straightforward. You can follow the instructions here to download a binary from the releases page then point BinDiff to the IDA installation.

Ease of use

Once BinDiff is configured, you can hit Ctrl+6 in IDA to export information about the currently loaded file as a .binExport. After both files have been exported, you can diff the exports.

The .binExport is another sqlite3 database with the following schema. An excerpt is included below.

  NA_RETURN_IF_ERROR(database_.Execute(
      "CREATE TABLE basicblock ("
      "id INT,"
      "functionid INT,"
      "address1 BIGINT,"
      "address2 BIGINT,"
      "algorithm SMALLINT,"
      "evaluate BOOLEAN,"
      "PRIMARY KEY(id),"
      "FOREIGN KEY(functionid) REFERENCES function(id),"
      "FOREIGN KEY(algorithm) REFERENCES basicblockalgorithm(id)"
      ")"));

I was able to export both big-A.dll and big-B.dll in ~30 minutes then diff them after 22 minutes.

BinDiff Options

In order to compare with Diaphora, I diffed both small-A.dll and small-B.dll. The diff took less than 5 minutes (including export).

-rw-r--r-- 1 jack jack 30M Oct 18 17:13 small-A.dll.BinExport
-rw-r--r-- 1 jack jack 24M Oct 18 17:13 small-A.dll_vs_small-B.dll.BinDiff
-rw-r--r-- 1 jack jack 30M Oct 18 17:13 small-B.dll.BinExport

The file sizes of the exports are pretty reasonable compared to the original file size.

Accuracy of detecting differences

BinDiff was pretty decent at detecting similar functions but manual review was necessary for a large number of functions which BinDiff was unable to match.

These are functions which BinDiff was able to successfully match.

BinDiff match