Sven Krasser’s Blog

Musings on technology, security, and sundries.

VGA from Scratch (Part 1)

It’s time again to leave the realm of Big Data behind for a small electronics project. After generating a VGA signal using an Arduino, I’ve decided to next generate a VGA signal from scratch. From scratch here means using 74HC00 series logic ICs.1

The video mode I picked is XGA at a 60 Hz refresh rate. XGA has a resolution of 1024×768 pixels and a pixel frequency of 65 MHz. By dividing the horizontal resolution by 4, we get a width of 256 pixels and a pixel frequency of 16.25 MHz. To keep the aspect ratio, I am also dividing the vertical resolution by 4, so the effective resolution produced is 256×192.

Spark and Speed

The typical response I get when I mention our usage of Spark is something along the lines of “Oh, it must be about the extra speed over Hadoop you get from the in-memory processing.” Speed and the in-memory aspect are certainly two things Spark is known for, and they are also touted on the project’s website prominently. However, neither of those are among the primary reasons why I invested resources to move my team to Spark as the default Big Data framework. Let’s take a look at what makes the difference.

Black Hat 2015 in Review

Here’s my quick (and belated) take on the Black Hat 2015 sessions I’ve attended. This year’s schedule offered a rich selection of Machine Learning related content, and it is refreshing to see that it is finally becoming a mainstream tool in the security community.

It goes without saying that all opinions are mine and not the ones of my employer. If I’m misjudging your session, then feel free to reach out—my opinion is formed based on data available, and it is of course always a challenge to cramp months of research results into an hour-long session. (If you are still disgruntled, take comfort in the fact that you attended the Speaker Party while I did not.)

Where available I have linked slide decks, whitepapers, or additional resources. Note that in some cases the slides presented at the event differed and have been updated (my remarks are applying to the version presented at the event unless noted otherwise).

Arduino Raster Bars

Since I’ve always liked to understand technology from first principle, I’ve embarked on a small project to generate a VGA signal from scratch on an Arduino Uno. (On the other hand, it could also be that after all Big Data work, a small data project in the 2 KB of RAM the Uno offers sounded quite appealing.)

Join the CrowdStrike Data Science Team

CrowdStrike Data Science is expanding – come and join the cause, see our job description. We’re at the intersection of Machine Learning, Big Data, and Internet Security (you don’t need to be an expert in all areas).

Data Science is in high demand. With so many options, why work for CrowdStrike? For starters, we’re a rapidly growing startup with a fun work environment and an awesome team. We have large-scale rich data sets and the corresponding ground truth to conduct meaningful supervised learning on it. We have a diverse and multidisciplinary team, so if you run into a problem or have a question, chances are a teammate has the answers. For example, need to understand that data field collected from that obscure Windows kernel API? Why not ask the guy who wrote the book about it?

We need your help extracting meaning from all that data. If you’re a person who ventures where no one has gone before, who likes to see their ideas implemented and making a difference, and who wants to shape the direction of a growing team, then talk to us. Interested? We’d like to hear from you at mission@crowdstrike.com!

Open Source Software – Who Actually Reviews the Code?

This post is co-authored by Robby Simpson and Sven Krasser. So, you can find it on both Robby’s and Sven’s blogs – you should check them both out!

Last year saw a large number of critical bugs in open source software (OSS). These bugs received a lot of media attention and re-opened the discussion of bugs and security in OSS. This has led many to question whether ESR’s famous statement that “Given enough eyeballs, all bugs are shallow” holds true.

A Trip Down Memory Lane (in 2.5D)

Last week, I picked up the re-released version of Duke Nukem 3D for PS3, mostly for old time’s sake (and because I have a soft spot for retrocomputing), and it made me wonder how much time has passed since its original release. Wikipedia to the rescue: the original Duke Nukem 3D was released in 1996. That interestingly means that more time has passed between its release and the present day than between its release and the release of Pac-Man. The latter was released in 1980.

It is truly amazing how much progress in computer graphics has been made in those 16 years between 1980 and 1996. While the next 16 years have been quite impressive as well (and gave us e.g. GPUs and real-time caustics in the browser), they weren’t the same kind of quantum leap.

Duke used a 2.5D engine, the Build engine, for which Fabien Sanglard wrote an excellent overview. A true 3D shooter on comparable hardware was available about a year earlier in the form of Descent. It’s also noteworthy that Quake was released 5 months later, also featuring true 3D.

Orb-3

Last year, I was fortunate enough to be invited by NASA to attend the launch of Orb-3 along with JPL’s RACE team whose cubesat was part of the cargo of this mission to the ISS. As you may recall, Orb-3 was the launch that subsequently suffered a catastrophic failure and fell back near its launch pad on Wallops Island, VA. We watched the launch from a viewing area shy of two miles away from Pad 0A. Some of my pictures are below.

New Post on CrowdStrike Blog

Yesterday I posted a new article on the CrowdStrike blog with some follow-up thoughts to the Machine Learning webcast. The post covers another concrete example on how combining weak indicators can generally yield stronger ones. (If you are familiar with ML, this won’t make you raise an eyebrow.) It also covers various areas of application in the security space. Specifically, for cloud-based security, there is an opportunity to go beyond the small data sets that e.g. AV can leverage and look beyond the first few seconds of execution on a single machine.

One question that I got after the webcast was what toolchain I used. Long story short, I processed the final data in IPython using scikit-learn. For the figures, I used matplotlib with seaborn. For feature extraction, we’ve used both Python and Pig. Any questions or feedback, tweet me @SvenKrasser.

First Post

Since the 140 characters that Twitter grants you felt a bit limiting at times, I’ve decided to set up this blog to convey the occasional more verbose thoughts on technology and the Internet security space. Based on my current research interests and area of work, expect a slant towards Machine Learning and Big Data topics.

Just to get started, here’s a recording of a recent webcast on Machine Learning fundamentals and its applications at CrowdStrike that I did jointly with Dmitri.