File-Based Malware: Considering A Different And Specific Security Approach


The cybersecurity solutions landscape has evolved from simple but effective signature-based scanning solutions to sandboxing—the isolating layer of security between your system and malware—and, most recently, to sophisticated detection methods. The new generation is based on behavioral detection or machine learning in an effort to defend against more sophisticated attacks.

Yet, despite these advances, news breaks everyday about a new cyberattack. Sure, WannaCry and NotPetya made headlines, but there are other malware strains out there with less fancy names, doing just as much as damage.

Artificial Intelligence and Machine Learning (AI and ML) detection methods are improvements to the status quo. They’re able to autonomously recognize threats based on patterns and trends of previously available data. They’re then able to start making accurate predictions, becoming proactive rather than just reactive.

Although this approach improves detection rates and reduces false positives, it is challenged when identifying zero-day/just-born vulnerabilities, due to its ongoing effort to tune and reduce false positives.

Timing also plays a key role. If you are able to detect the malware, chances are it is already inside your network, and it might be too late.

Considering that CISOs and senior cybersecurity managers are inundated with many tasks and responsibilities and saddled with tight budgets, why not apply a specific approach to file-based malware, or take a calculated risk and apply detection to all attack vectors?

A major culprit: file-based malware

A data point to consider: According to Verizon’s 2016-2017 Data Breach Report (US ICS-CERT 2015-2016), more than 50 percent of recent attacks originate via file-embedded malware. In addition, more than 90 percent of malicious attacks originate via trusted channels such as email, web and B2B file transfers.

Why are files so often used as a host for malicious codes/attacks?

First, it is fairly simple to mutate existing threats into undetectable form, as well as hide known malware in an encrypted file, zip file or file embedded within a file that can evade traditional scanning and detection technologies.  New and advanced exploits, threats and zero days continue to challenge the latest generations of detection and response technologies.

Secondly, files are needed for daily business operations, and you can’t block all of them—even risky macros in Excel are needed by a segment of employees.

Moreover, in a highly secured environment such as critical infrastructure, files are introduced to the operational technology (OT) network, which is, in most cases, completely isolated or air gapped. A file-based infection on the OT could result in significant physical damage and sometimes even loss of life.

Prevention-focused detection with Content Disarm and Reconstruction (CDR)

In order to prevent file-based infection, cybersecurity officers can augment detection technologies with non-detection-based technologies, such as CDR.

CDR is a proven technology that was used for many years in a nascent form by the Israeli armed forces. It can be used to both ensure the safety of files before entering your network and apply security and control for outgoing files. Leveraging redaction capabilities, the content of outgoing files can be searched and replaced, with metadata removed and file re-formatting applied.

Focused on prevention, this approach treats every file as suspicious and eliminates the human decision element (“good to go” vs. stop/quarantine). CDR starts and concludes outside the network, therefore delivering the file to the network only after it is disarmed and safe to use.

A good CDR solution has both detection and disarm/reconstruction technologies.

Deconstruct and Detect

This first step deconstructs the file, breaking it down to its basic elements. Examples include:

  • Separating the individual files in a zip file
  • Separating files embedded within another file
  • Deconstructing the embedded files in a typical email message (can be dozens of different files in one email)

This approach enabled a deep and exhaustive detection process, which includes:

  • Multiple commercial antivirus engines tuned and optimized to work together and next-generation/AI and ML detection technologies
  • Multiple TrueType engines, which ensure that a file is indeed the file it claims to be
  • Other detection engines such as Base64 identification

This process ensures that the known malware stays out even before the disarming and reconstruction process even starts.

Deconstruction followed by a detection process also provides flexibility to apply different security approaches for different file types (i.e., you might decide not to apply reconstruction on all files).

Disarm and Reconstruct

The disarming or neutralization technique removes all “impurities” that are not approved within the system’s definitions and policies. The final process is reconstruction, in which, using its native drivers, the file is reconstructed back to its purest, native form.

Throughout the process, all the extra contents or impurities, such as hidden scripts and non-complying elements, are dropped.

The reconstruction process can be “soft” or “hard.”

In soft reconstruction, the CDR engine follows a set of pre-defined rules and policies and drops all components of a file that do not meet the requirement of zero cost in usability. For example, the original file will be restructured using native drivers and matched with a set of rules and policies. Anything that does not match the rules will be dropped (e.g. macros, embedded scripts, embedded files). This method maintains the fidelity, visibility and usability of the file.

Hard reconstruction allows the CDR engine to practically rebuild the file, either by converting it to another file format or just recreating the file using native drivers at the minimal or moderate cost in usability.

For example, when a DOC file is converted to an HTML file, all the DOC-specific features such as macros and embedded files are essentially crushed in the new environment. When this HTML file is converted back to a DOC format, it completely disposes of all the potentially dangerous components. From a security point of view, this approach is very effective since all the impurities—which usually include potentially malicious contents—are eliminated.

Such a highly stringent option exchanges usability for a virtually unhackable screening. However, it is important to note that this example is cited for the simplicity of understanding the model. In the industry, the top CDR engines take years of research and development, and typically come with a collection of much more sophisticated technologies to achieve a desirable balance between usability and security.


File-based malware represents a large and heavily used attack vector that warrants a different augmented approach to detection that might be sufficient to fend off other attack vectors.

Such an approach not only ensures that incoming files are safe to use but also retains the fidelity, functionality and usability of the file. In addition, CISOs and the information security department will act as an enabler and improve productivity by allowing the acceptance of files that would otherwise be blocked.

Reference & with special thanks: “Addressing the Fundamental Deficiency in Today’s Mainstream Cyber Security Strategies From Detection to Non-Detection. Why Is CDR / CDNR So Important?” by Athena Dynamics

Yakov Yeroslav
Yakov Yeroslav is the Founder and CEO of Sasa Software. He is a former electrical engineer with over 20 years of experience in cybersecurity.

Yakov Yeroslav Web Site