A collection of cybersecurity content.

Hunting Masquerading Executables: The Significance of the MZ header

INTRO

A common technique for adversaries to avoid detection when executing malware is to masquerade their tools under the guise of something that appears to be harmless. MITRE gives some wonderful descriptions into the techniques that have been used by various actors under Defense Evasion: T1036 Masquerading. Say, for instance, an adversary managed to coerce a user into running a malicious macro inside a Microsoft Office file that downloaded a remote malicious payload. This payload was able to change the system’s file association to specify a specific program like PowerShell to run a program with a ‘.img’ extension. Let’s dive into detecting these files on a host.

Research

Files contain a particular sequence of bytes (“magic bytes”) in the header. The file signature can be used to identify a particular file type regardless of the extension, or lack thereof. Armed with this knowledge, we can assume that the MZ header serves as an identifier for the file format and indicates that the file is an executable. The MZ header contains information about the location of the program’s entry point and the size of the header, which are used by the operating system to load the file into memory and begin execution.

While this blog will focus on the significance of the MZ header, you can read more information on the topic of file signatures posted by both Wikipedia and Gary Kessler who maintain useful lists.

“MZ” stands for Mark Zbikowski, one of the architects of the original executable file format for MS-DOS.

During my research a couple years ago, I came across a SANS poster of which I can no longer seem to identify to appropriately credit the source of what was used to improve from. Some PowerShell code was provided to identify file signatures using the converted ASCII representation of “MZ” into its hex format as ‘4D 5A 90 00’. The ’90 00′ part of the hex value represents padding, which is used to ensure that the header is a multiple of 16 bytes in length. However, I had some issues with the script that needed some upgrades:

  1. The original script attempted to load the entire file contents into memory before reading the minimum byte sequences. This was an issue when iterating over an entire system as there would be memory capacity issues.
  2. The formatted structure was difficult to read.
  3. File property extraction did not take place.
  4. Hashes were not pulled when a match was found.
  5. The contents were not written to an output file.

Now that these changes have been implemented, it is time to test the outcome.

Hunting

To make sure the concept works as expected, I’ve renamed an executable in my C:\temp location from ‘Sysmon.exe‘ to ‘Sysmon - test.img‘.

The script outputs the following results in “C:\temp\TA0005_Defense_Evasion\T1036_Masquerading_Executables.csv” if there are any matches:

Full PathFile NameFile SizeLast Access TimeCreation TimeLast Write TimeBase NameExtensionMD5SHA1SHA256SHA384SHA512
C:\Temp\Sysmon – test.imgSysmon – test.img7.81 MB<output><output><output>Sysmon – test.img.img<output><output><output><output><output>
Output results
Script
<#

.SYNOPSIS
This PowerShell script is designed to assist Cyber Security Incident Response teams in identifying executable files that may be attempting to conceal their true nature by using an uncommon file extension for its file type. The script does this by checking the magic number sequences, also known as "magic bytes," which are specific bytes that appear at the beginning of a file and can be used to identify the file type.

.DESCRIPTION
find_masquerading_executables.ps1

.EXAMPLE
.\find_masquerading_executables.ps1

.NOTES
Modify the following variables before execution:
1. $csvfile
2. $root_dirs
3. $ignore_these_extensions (depends on your environment)

.LINK
https://www.sans.org/
Improved based on an old SANS article I can no longer find to reference. Changes made were:
1. Restructured format
2. No longer reads in all file contents to memory
3. File property extraction
4. Hash extraction

.OUTPUTS
$csvFile

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#>

# create a CSV file to hold final output
$csvFile = "C:\temp\TA0005_Defense_Evasion\T1036_Masquerading_Executables.csv"

# specify where to begin search
$root_dirs = "C:\"

# Define an array of algorithms to use
$Algorithms = @("MD5", "SHA1", "SHA256", "SHA384", "SHA512")

# create an array to hold the results
$masquerading_executables = @()

# ignore these extensions
$ignore_these_extensions = '.exe','.scr','.sys','.dll','.fon','.cpl','.iec','.ime','.rs','.tsp','.node','.bak','.acm','.ax','.ocx','.olb','.vbx','.vxd','.386','.api','.flt','.zap'

# iterate through the root directories
foreach ($root_dir in $root_dirs)
{
     #get a list of all files in the root directory and its subdirectories
     $items = Get-ChildItem -Path $root_dir -Recurse -File

     #iterate through the list of files
     foreach ($item in $items)
     {
          #ensure the item is not an ignored extension
          if ($item.extension -notin $ignore_these_extensions)
          {
               # read the first 4 bytes of the file as a byte array
               $magicbytes = Get-Content $item.FullName -Encoding Byte -ReadCount 4 -TotalCount 4

               # convert the byte array to a hexadecimal string
               $hexstring = '{0:X2}' -f $magicbytes

               # check if the hexadecimal string matches the magic bytes
               if ($hexstring -eq '4D 5A 90 00')
               {

                    # add the file name to the results array
		    $masquerading_executables += $item.DirectoryName + "\" + $item.Name

               }
          }
     }
}

# Continue only if the $masquerading_executables array is not empty
if ($masquerading_executables.Length -gt 0) {

    # Check if the 'Defense Evasion' folder exists
    if (!(Test-Path -Path 'C:\temp\TA0005_Defense_Evasion')) {
        # Create the 'Defense Evasion' folder if it does not exist
        New-Item -ItemType Directory -Path 'C:\temp\TA0005_Defense_Evasion'
    }

    # add the column headers to the first row of the CSV file
    Add-Content -Path $csvFile -Value "Full Path, File Name,File Size,Last Access Time,Creation Time,Last Write Time, Base Name, Extension, MD5, SHA1, SHA256, SHA384, SHA512"

    # iterate through the items in the $masquerading_executables array
    foreach ($executable in $masquerading_executables)
    {
        # get the file properties
        $fileProperties = Get-Item $executable

        # calculate the file size in a more readable format
        if ($fileProperties.Length -gt 1GB)
        {
            # file size is greater than 1 GB, so convert to GB
            $fileSize = "{0:N2}" -f ($fileProperties.Length / 1GB) + " GB"
        }
        elseif ($fileProperties.Length -gt 1MB)
        {
            # file size is greater than 1 MB, but less than 1 GB, so convert to MB
            $fileSize = "{0:N2}" -f ($fileProperties.Length / 1MB) + " MB"
        }
        else
        {
            # file size is less than 1 MB, so convert to KB
            $fileSize = "{0:N2}" -f ($fileProperties.Length / 1KB) + " KB"
        }

	# Loop through each algorithm
	foreach ($Algorithm in $Algorithms) {

    	# Use Get-FileHash to calculate the hash of the file using the current algorithm
    	$Hash = Get-FileHash $executable -Algorithm $Algorithm

    	# Check the algorithm used
    	if ($Algorithm -eq "MD5") {
             $MD5 = $Hash.Hash
        }
        elseif ($Algorithm -eq "SHA1") {
            $SHA1 = $Hash.Hash
        }
        elseif ($Algorithm -eq "SHA256") {
            $SHA256 = $Hash.Hash
        }
	elseif ($Algorithm -eq "SHA384") {
            $SHA384 = $Hash.Hash
        }
        elseif ($Algorithm -eq "SHA512") {
            $SHA512 = $Hash.Hash
        }
    }

	# write the properties to the CSV file
	Add-Content -Path $csvFile -Value "$($executable),$($fileProperties.Name),$fileSize,$($fileProperties.LastAccessTime),$($fileProperties.CreationTime),$($fileProperties.LastWriteTime),$($fileProperties.BaseName),$($fileProperties.Extension), $MD5, $SHA1, $SHA256, $SHA384, $SHA512"

    }
}
Conclusion

MZ headers are a crucial component in identifying file types. They are the first two bytes of an executable file in the DOS and Windows operating systems and serve as a signature for the file format. The information contained within the MZ headers can be used to determine if a file is an executable file or not, as well as providing information about the file’s size and location of the program code. The use of MZ headers in file type identification is a reliable and efficient method, and they continue to play an important role in today’s computer systems.