Analysis on “MalDoc in PDF” September 7, 2023 | 4 min Read

Analysis on “MalDoc in PDF”

Table Of Contents

Introduction:

This year, in July, a new “MalDoc in PDF” attack which could evade detection and analysis was shared by JPCERT. This malware was polyglot, meaning a file that combine two or more file formats in a way that it could be executed as more than one file type by different application without error to evade detection and hinder analysis tools. In this case, the malware sample could be opened either as Word or as PDF and had embedded MHT (MIME HTML) that encoded macro in its ActiveMime.

Identification:

Source: Triage

SHA256: ef59d7038cfd565fd65bae12588810d5361df938244ebad33b71882dcf683058

AV-Detection: 22/59 (VirusTotal)

Analysis

Static Analysis

The sample collected was a PDF file, which are normally initial stager. Let’s start with static analysis of this sample by loading this file in VSCode. Initially, its contents looked legitimate with PDF header and objects as shown below.

But scrolling through the contents, there is MIME version declaration as shown below. From this part, the MHT (MIME HTML) content starts. Since this sample is polyglot and can be executed as Word file, the Word file can render this MHT contents.

The object inside the MHT content point to ActiveMime. Here, ActiveMime is an undocumented Microsoft file format that is often used to encode the macro.

Also in above figure, note that the referenced location (href) to the ActiveMime of ‘Edit-Time-Date’ object is encoded, which was decoded using CyberChef as:

Using decoded location path name ‘lonhzFH_files/image7891805.jpg’, it was searched in VScode and jumped to that part as shown below. The content (ActiveMime) is obfuscated text with lots of spaces.

The contents from above were copied to CyberChef where it revealed that those spaces were actually created by CR (Carriage Return)’.

The content was then decoded from Base64 revealing it to be ActiveMime, which encode the macro. So, it was then saved as macro.bin.

To search for the embedded macro inside the decoded content, binwalk was used.

binwalk macro.bin

The binwalk revealed a compressed data, which is part of the ActiveMime format that encode the macro. It was again extracted using binwalk.

binwalk -D='.*' macro.bin

After extracting, it revealed a file named 32, as shown below.

The above extracted file should be the decoded malicious macro. So, it was checked using the Oletools. First, Oleid was used to check for any macro.

oleid 32

The Oleid tool found VBA macros within it.

To extract the VBA macro inside it, Olevba tool was used.

olevba 32

Here, from the output of olevba tool, it can be found that the macro will download a msi file from its C2 server (hxxps[:]//cloudmetricsapp[.]com/wp-content/uploads/docs/addin[.]msi). Then, it will execute the downloaded msi file with Office.InstallProduct. Also, this macro will be executed as soon as the Word file is opened through AutoExec, and download and install that msi file.

This further verify that this pdf sample file is an initial stager that will bypass detection and download another payload, which is the msi file.

Using the URL obtained from the macro, it was checked in Browserling sandbox if it acts as downloader. But it was unreachable as shown below.

Dynamic Analysis:

Let’s now further analyze this sample through dynamic analysis. The PDF file was saved as Word file and opened. Notice the ‘Enable Content’ is still there. Although this sample was able to bypass detection, it was not able to bypass this protection measure.

When the ‘Enable Content’ is enabled, the Word will render the MHT contents, which will decode the macro from the ActiveMime blob and execute it to download the msi file as discovered during static analysis.

In the Wireshark capture, it can be seen that it is resolving the domain cloudmetricsapp[.]com to IP address of 179[.]60[.]147[.]105. After that the infected host tried to initiate connection over 443 to download the msi file but get no reply from server as shown below.

It is clear now that this ‘MalDoc in PDF’ is an initial stager that will download its final payload by bypassing detection. And there have been many scenarios where one malware family has been used as initial stager for other malware families. So, in near future, this malware could probably be used by other malware families to act as their initial stager. So, the next section will include YARA rule for detecting these kind of malware samples.

Detection with YARA:

rule MalDocinPDF{
meta:
    description= "Detecting MalDocs in PDF"

strings:
    $mht0 = "mime" ascii nocase
    $mht1 = "content-location:" ascii nocase
    $mht2 = "content-type:" ascii nocase
    $mht3 = "Edit-Time-Data" ascii nocase
    $doc = "<w:WordDocument>" ascii nocase
    $xls = "<x:ExcelWorkbook>" ascii nocase
 condition:
    (uint32(0) == 0x46445025) and
    (2 of ($mht*)) and 
    ( (1 of ($doc)) or 
      (1 of ($xls)) )
  }

IoC:

Explore IoC of ‘MalDoc in PDF’ from Virus Total Graph as they have great visualization.

Link: (VirusTotal Graph)