Here’s my quick (and belated) take on the Black Hat 2015 sessions I’ve attended. This year’s schedule offered a rich selection of Machine Learning related content, and it is refreshing to see that it is finally becoming a mainstream tool in the security community.
It goes without saying that all opinions are mine and not the ones of my employer. If I’m misjudging your session, then feel free to reach out—my opinion is formed based on data available, and it is of course always a challenge to cramp months of research results into an hour-long session. (If you are still disgruntled, take comfort in the fact that you attended the Speaker Party while I did not.)
Where available I have linked slide decks, whitepapers, or additional resources. Note that in some cases the slides presented at the event differed and have been updated (my remarks are applying to the version presented at the event unless noted otherwise).
The Lifecycle of a Revolution
Jennifer Granick’s keynote on the promise of a free Internet, an assessment of the current state of affairs, and an outlook and call to action. Thought-provoking and very relevant in light of the ongoing discourse on regulation and the Net.
Why Security Data Science Matters and How It’s Different: Pitfalls and Promises of Data Science Based Breach Detection and Threat Intelligence (Part 1)
Joshua Saxe reviewed the topic in two sessions. Part 1 covered the fundamentals of Machine Learning and the specific challenges in the security space. This has been a thorough overview for newcomers (not too unlike from—shameless plug ahead—my CrowdCast on these topics).
Why Security Data Science Matters and How It’s Different: Pitfalls and Promises of Data Science Based Breach Detection and Threat Intelligence (Part 2)
Joshua’s second session covered three case studies of data science in the security space. I’ve found the results presented on static analysis of executable files most interesting. The session included a high level of technical detail, novel ideas, and results in the form of hard numbers. If you have an interest in data science, this was the session you’d definitely wanted to be in.
Joshua pointed me to a follow-up paper on the static analysis work, which is available as a preprint over at arXiv as of last week.
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Alex Pinto (of MLSec Project fame) and Alexandre Sieira covered what to look out for when evaluating threat intelligence feeds in a lighthearted and healthily data-driven session. Interesting results: feeds generally have very spotty coverage and threat sharing communities tend to not live up to their promise. The guys have been on the record previously on not buying into the efficacy of attribution (especially of the GeoIP kind), and also in this session they continued to call it as they saw it.
Abusing Windows Management Instrumentation (WMI) to Build a Persistent, Asynchronous, and Fileless Backdoor
WMI allows to add event-based triggers that can launch arbitrary actions. All of this is conveniently stored by Windows in an undocumented format in the registry. Aside from walking the audience through the details, Matt Graeber also announced the release of open source tools to make sense of said registry data.
Graphic Content Ahead: Towards Automated Analysis of Graphical Images Embedded in Malware
Alex Long presented research (and a UI frontend) to sift through icons and other images in sample files to devise a distance metric based on image similarity allowing for clustering of files. An interesting 30 min talk with a pleasing level of technical detail.
The Application of Deep Learning on Traffic Identification
In summary: Deep Learning on binary packet data allows discrimination of various protocols. A short 25 minute talk, overall nothing too surprising.
The Memory Sinkhole – Unleashing an x86 Design Flaw Allowing Universal Privilege Escalation
This was probably the most entertaining session, peppered with a tad of audience baiting and a dash of hype. Christopher Domas dives into a great review of the underlying x86 architecture, its history, and the resulting problems. He then presents an attack that gets you from kernel land to below the hypervisor (all with a captivating narrative).
Defeating Machine Learning: What Your Security Vendor is Not Telling You
The basic premise of the talk is that having the same model on numerous machines allows an attacker to test his/her binaries until they are undetected. This talk left me overall disappointed. While allowing an attacker to tweak binaries to avoid detection for months while allowing yourself some fractions of a second to classify them is certainly rightfully called out as a problematic approach (that is also common to A/V), there is a bit more to ML usage than a static model on the endhost, so calling out the defeat of ML makes mostly for a catchy title.
The proposed solution is to update the model depending on local data (i.e. labels) to derive an environment-dependent model. While this introduces variety, it does not prevent an attacker from tuning various models in the same fashion. It also increases the cost to the defender while the attacker can still cheaply generate variants (more will get caught, but throwing malware at an organization is not exactly a pricey endeavor).
From False Positives to Actionable Analysis: Behavioral Intrusion Detection, Machine Learning, and the SOC
I looked around the room and saw that there were people in the audience that got something out of this talk. Evidently, I was not among those people… Some interesting themes, and the whitepaper is at least on my reading list now.
Internet-Scale File Analysis
A very practical talk – if you have a large malware collection, you likely know very well about the pains running analysis on the data. This talk presented a framework (released as open source) aiming to solve these issues.
Deep Learning on Disassembly
The first half of this talk covered the basics of Deep Learning and convolutional neural networks. The second half was slanted towards original research that creates 1-bit images on disassembled binaries (one instruction per row), unfortunately with little technical detail. During a live demo, VirusTotal samples with more than 20 engines detecting were thrown at the model, which detected (if memory serves right) around 80% of these. No data on false positives was shared (and no ROC curve was shown for that matter), so the audience was left with an interesting idea but no hard results. Good news: the downloadable slide deck has some additional information.
Most Ransomware Isn’t as Complex as You Might Think
A brief 25 minute talk on the details of ransomware. We see new such samples going frequently into our net, so I was curious about the properties that researchers are looking at. The main differentiator to other malware is that ransomware eventually must announce itself to extort money from the victim. This means, for example, that we can for example expect certain graphical UI elements to be part of the binary. Some other common features of ransomware are the utilization of crypto libraries, large scale file enumeration and deletion, or mechanisms to lock the user out of his/her desktop.
Update 08/19/2015: slides for the Data-Driven Threat Intelligence talk; also clarifications and a cat GIF. Thanks Alex!