Scanning Binaries For PE Format Anomalies

By: ThreatLabz


After processing tons of malicous binaries, I would like to share my findings about anomalies found in PE binaries. These anomaly information will be helpful for security researchers on suspicious sample validation and sample clustering.

1. Binary strings nearby EP

Of course, EP binary is very popular for AV companies to work out malware signatures. So I put it at first. 81ec8001000053555633db57895c2418c74424103091400033 is most frequent EP string used by malware, which stands for stack operations. The second one is 60e803000000e9eb045d4555c3e801000000eb5dbbedffffff

This finding is pretty much similar with another research work from
 That article listed top-10 EP strings as the followings:



    35498 55 8B EC 6A FF 68    22712 55 8B EC 83 C4 F0    14775 55 8B EC 53 8B 5D     7711 4D 5A 90 00 03 00     6959 55 8B EC 83 C4 C4     5775 4D 5A 50 00 02 00     3497 55 8B EC 83 C4 F4     3190 60 E8 00 00 00 00     3080 83 7C 24 08 01 75     2152 55 8B EC 83 C4 B4




2. Section names

I concatenated each section name into a string. Here is the top ones.

UPX is still the favorite packer for malware wirters, followed by UPack.

The above figure shows the values of section number in DESC order.  Most malicious samples have 3 sections.

I also picked some funny section names in Chinese:
BY 小广

Here is the longest one:


Google translate results:

National Day special edition birthday of the motherland motherland's prosperity and people's well being I wish you all good health and success in your work all wishes come true and good luck. And a big smile!

3. anomaly score

I defined about 10 anomaly features which were used to calculate the total anomaly score.
The following is the score distribution. 


