The MITRE Engenuity ATT&CK Evaluations 2024 results are out and, with them, another year of vendors claiming victory. As a reminder, these evaluations have no winners or losers — just sweet, sweet data.

Case in point, MITRE ATT&CK tracks and tests on techniques that could be completely benign, even something as simple as T1059.004, which launches a Unix shell. Depending on the user, this could be a totally normal activity — but it could also be an attacker. Similarly, T1059.002, using AppleScript, could be perfectly legitimate and was actually used in the test to generate benign noise.

If a vendor says that it achieved 100% on the evaluations, it is likely doing one or more of the following:

  • Manipulating the results by only showing parts of results that they feel benefit them
  • Turning on settings in the product that are unrealistic for a real-world environment so as to appear more effective
  • Treating the results as a competition instead of a learning opportunity and a chance to improve the product

So long as you look at these evaluations as informative data, not providing winners or losers, you can get real value out of the results. With all that silliness aside, let’s get into what you need to know.

The evaluation broke new ground with macOS.

The evaluations focused on two adversary scenarios: ransomware targeting Windows and Linux (CL0P, LockBit) and DPRK targeting macOS.

Range operating systems
WindowsWindows Server 2022

Windows 11

Linux Ubuntu 22.04.x LTS
macOSOS: macOS Sonoma 14.x

Arch: Apple Silicon

The focus on macOS is a new addition to the evaluations. It’s exciting to see this type of evaluation cover macOS, as the capabilities that tools have on this OS tend to be more of a black box than the more well-tested capabilities on Windows and Linux.

The evaluations take place over several days per vendor. They kick off with detection rounds, then allow a day for configuration changes and retests (which could include deploying additional detection rules, gathering additional telemetry, making changes to the UI, etc.). The protection round is executed last. All emulations were done post-compromise to examine the detection and protection capabilities once an adversary gained access.

Background noise and alert volume make the detection results especially useful.

One interesting hurdle MITRE introduced this time is background noise and false positives. In this round, MITRE generated additional signals to serve as background noise and tracked false positives. This tests the product’s ability to only find truly malicious behavior and not alert on benign activity. It also makes it more difficult for vendors to crank up the detection capabilities to alert on everything, which has skewed vendor results in the past.

MITRE also introduced a “volume” metric. This was a much-needed addition, as in the past, some vendors issued thousands of alerts in a single scenario, which, in practice, leads to a lower-quality analyst experience. Now, the results show exactly how many alerts were triggered for each scenario and the severity of those alerts.

Protection micro-emulations give more granular results.

There was a separate emulation plan for protections (though still focused on ransomware) than detections this year, which helped keep the test realistic. In addition, MITRE tested protections via micro-emulation plans, which MITRE defines as compound behaviors involving a short series of related ATT&CK techniques that are frequently used together in real-world attacks.

Instead of running the entirety of the emulation end to end, MITRE bundled a select few techniques together. For example, Test 1 looked at enumeration and exfiltration via batch script and rclone (a combination of added noise [T1059.003, T1105, T1021.001] and actual activity [T1560.002 and T1048.003]). This is not the full scope of the attack, but it is a series of steps that are common in attacker activity.

Using micro-emulation plans is important when testing preventive controls — instead of having an attack blocked from the very start. This lets you see exactly how effective the tool is at blocking each portion of an attack. It’s important, however, to remember that expecting a tool to block every micro-emulation plan is unrealistic, as certain actions should not be blocked in isolation. For example, archiving collected data and then exfiltrating it, as mentioned above, is not necessarily malicious. Some prevention methods rely on understanding user behavior or indicators of compromise. Further, due to the constraints of the test, the testing doesn’t consider locking down user account permissions based on use case or some of the tuning that happens over time with analytics about typical user activity.

It’s still difficult to know what to do with the results.

The MITRE team has put a lot of work into making the results consumable via a very easy-to-use results page that lets you compare and contrast different vendors, see screenshots of their capabilities, and clearly see alert volume. We highly recommend looking through this page. With that said, we will be releasing a more in-depth report in the coming months that provides more complete details on the evaluation results and how to use them.

Stay tuned and if you’re a Forrester client book an inquiry or guidance session with me if you have more questions.

source

Leave a Reply

Your email address will not be published. Required fields are marked *