Friday, October 16, 2015

Bro in the Classroom

This past Wednesday, I was lucky enough to be asked to lead an evening class at DePaul University on using Bro. The students in the class are preparing for an annual cyber defense competition called CCDC. This  competition is actually where I cut my teeth defending realistic computer  networks. I had the opportunity to participate through multiple years of competition, one of which we even made it to the national level. After graduating I also had a chance to experience CCDC from the other side while participating on the red team. Last Wednesday, for the first time, I had a chance to see the other side of a university course while leading the class.

I was given three hours time for the class which I split 60/40 between lecture and hands-on lab time. The lecture material presented foundation topics  needed to understand what Bro does and how it works. A decent  understanding of protocol design, event based systems, system administration, and network forensics are all required before thinking about Bro. These topics were covered briefly, followed by Bro specific material. I also briefly discussed ElasticSearch and ELK and how those projects integrate with Bro (the ElasticSearch log writing code is now a plugin).

I've published the slides I presented for the lecture as well as the lab (which includes the step-by-step commands needed to set up Bro and ELK).

Instructing a class was an experience; instructing a class remotely was definitely a challenge, and I thank everyone for their patience with my ignorance of remote meeting software. I really enjoyed it and hope all the students gained *some* knowledge from the material even if it was only from my opinionated tangential rants about RFCs and HTTP.

Tuesday, September 1, 2015

Bro Plugins Part II: Redis

Plugins Part II
Plugins in Bro have been around for a while, however they are starting to move outside of the "core" and are now beginning to resemble Apache modules. I think this is wise for a few reasons. One, as Bro becomes popular certain features need to stay in the core and certain features need to definitely not be in the core. This is the same reason why numpy or pandas libraries aren't included in Python by default. They offer fantastic solutions to specific problems but not everyone needs or wants them. Two, as more people start to use Bro and want to customize it an issue around trusted extensions and capabilities arises. Moving to a plugin architecture will push the burden of vetting code to the Bro operator and away from the Bro development team. Third, an external plugin architecture provides a nice landscape for a package manager, which has been on the Bro todo list for a little while.
To test out the Redis plugin, simple clone the Bro repository recursively as typically done, configure and build Bro, change to aux/plugins, and run "make build-redis". Lastly, you'll need to set your BRO_PLUGIN_PATH environment variable the location of aux/plugins. To compile the plugin you'll need a few development header files installed but if all was successful you should see a dynamically loaded Redis plugin when you run:

bro -N 2>&1 | less -S
All the configurable options for the plugin are located in aux/plugins/redis/scripts/init.bro, which is bootstrapped by aux/plugins/redis/scripts/__load__.bro as one would expect.

If you then start the Redis server on the same host you are running Bro on and invoke Bro with the script aux/plugins/redis/scripts/Bro/Redis/logs-to-redis.bro you should eventually see protocol logs which would have been sent to Redis dumped to stdout. Currently, I could only get the plugin to execute the Redis PING command for each log entry Bro was ready to write.

It will be interesting...
It will be interesting for reading/writing to/from memory instead of disk. Imagine using Redis as a LRU cache to store things like HTTP user-agents seen egressing a network or creating a hot (Redis) and cold (ElasticSearch) storage system for complex hash maps of connection tuples and service types. Check out this sort of old article outlining how HipChat did it.

It will be interesting to use for logging. Redis is often used as a broker back end in scalable applications and I think having logs written to it is a big first step in building larger scale systems which act on network activity. Other processes and systems may be able to ingest Bro logs much quicker than if reading from flat files or having to hook into Bro specific client libraries.

It will be interesting for the Input framework. Having Redis as a back end could also allow easier integration between third party intelligence providers. For example, external Python scripts could be used to parse network indicators from those flashy [adjective] [noun] titled threat reports vendors publish and push those threat indicators into Redis for Bro to ingest and notify against.

In Conclusion    
It's good to see Bro playing nice with other open source project. The JSON log writer was a huge boon for those indexing logs with ElasticSearch and Mongo. Plugins of these types will assist in more wide spread use and adoption of Bro outside of research labs.

Sunday, August 16, 2015

Exploring Rabbit Holes

First off, thanks to @brucedang for getting me back into this blog. I started writing this thing for myself in hopes of keeping track of what I've done with Bro and how I learned it. It's rewarding to hear other people find it useful too.

I was tossing around a few ideas for a blog post and decided to grep through scripts/ for modules to discuss. If you search Github for "module" you can find them, but the command I used to do so was:
grep -R '^module' ./* | cut -d':' -f2 -s | sort -u | less -S

I came across a few things I hadn't heard of or didn't remember digging into before. The KRB module was new to me. It's for Kerberos protocol analysis. Other modules were more familiar to me but I haven't grokked the code in them. Things such as:
  • Known
  • Weird
  • Files
  • PE
  • AppStats

I also found a confusing module called Threading. I found this confusing because I was under the impression Bro was single threaded. The documentation for setting up a Bro cluster discusses CPU pinning for workers. What was this threading module used for?

I grepped through scripts/ again for any references to this module with:
grep -R '^module Thread' ./*
One hit. In scripts/base/init-bare.bro. This file is hard coded (linking to lines in Github generally isn't immutable, so search for 'add_input_file("base/init-bare.bro");') to load as long as Bro isn't run in "bare" mode (a mode where Bro loads a minimal set of scripts). So I opened this file and found a very small amount of Bro script belonging to that module's namespace. In the module's export declartion was a single re-definable constant called "heartbeat_interval". It was set to 1.0 seconds. There were also some comments stating that changing this value will likely break some things.

Interesting. So I searched through scripts/ again, this time looking for references to "heartbeat_interval". I found zero use of the value. So why was it defined at every Bro invocation?

At first I thought it must be dead code. Perhaps the module was left over from an unfinished feature or perhaps it was from a feature that was available in an older version but longer supported. Large C++ projects are often notorious for having dead code. But this is the Bro team; I should have known better.

I went to Github again and looked through the blames around that line in inti-bare. The definition of heartbeat_interval was done in a merge named "topic/robin/input-thread-merge". I was still confused. So I grepped through src/ to try to find where this value was used. This time I found a few different .cc files.

Again going back to Github and looking at the blames for these files I found the heartbeat_interval variable is used in the Input framework. Going back to the Input framework documentation, the use of threads for reading data into Bro is described. Thus the mystery was solved. Threads are used for reading in files from disk for Bro to use such as in the Intel framework. I had encountered the use of threads in the Input framework before but didn't truly know how they worked until now.

Pretty neat. 

This is how I found one of the many interesting pieces of Bro. If you poke around the scripts and core source enough, you find a huge amount of interesting code and concepts.

Sunday, October 5, 2014

Out of the Scripts and Into the Core

In the previous post I discussed plugins in Bro and how they could be used to move code from the core to self contained directories. After talking with Seth, I realized plugins had been around in Bro for a while (longer than I thought). I had no idea as most of my experience with Bro was in scriptland. The input framework was built with Bro's plugin system and that's been around for a bit.
Seth suggested I try to move the geoIP lookup features of Bro into a plugin. I immediate thought, with my great new understanding of plugins, it would be quite simple. The geoIP lookup code relies on libgeoip and the legacy MaxMind C API. It exposes a few additional built in functions in bro.bif and makes use of those functions in a few base/policy scripts. I thought it would be easy, involving a few copy/paste commands and moving some script files around. It turns out the job requires knowing some C++ and being more familiar with Bro's internals than I initially thought.
I can read C++ and write some very basic "Hello World" type programs; that is the extent of my C++ knowledge. I also had never really dug deep into Bro's source before. I did do this. However, looking back that was relatively trivial. After grep'ing through much of the src/ directory, I decided I should document some key finding of the core before attempting to modify it. The following is meant to be rough notes and references on C++ and the source which comprises the core.

Generic C++ Programming Concepts
C++ has been around for a while. These concepts aren't anything new to many and might seem very rudimentary to some. I approach Bro from the perspective of someone with relatively strong systems, networking, and security experience. I would say my understanding of programming, especially in C++, isn't as refined.
- C++ is a statically typed compiled language. It can be thought of an extended version of C.
- Things need to be declared before they are defined. This is fairly common across languages I'm used to.
- All things have a scope. Global vs local scope is important and influences references. Local takes precedence.
- Everything has a namespace. Namespaces are a mechanism for grouping things so that we don't have to dump everything into a global scope. Python has namespaces in packages (e.g. the match function in re) and Bro has namespaces as modules. In fact modules in scriptland translate to C++ namespaces.
- C++ is object oriented, which means classes. Classes are data type definitions. A thing of a particular class is called an object. This isn't specific to C++. Classes have member functions, in Python a string class has a function of split(). Some are class variables and methods are public and some are private, which determines how the variable or method can be accessed.
- C++ supports (multiple) class inheritance. Classes can inherit structure from base classes. An example of this is a class that describes a bluebird. The bluebird class inherits attributes and structure of the bird class and, in turn, the bird class inherits attributes and structure of the animal class. Class inheritance provides a generalized way of writing classes that maximizes code reuse.
The Core
Much of Bro's source consists of class definitions and by reading (well, OK, skimming) through the source (well, OK, the header) files, I came up with four buckets most of the code in Bro's core does. These aren't meant to be comprehensive. I didn't write the core so don't take my word for it. Instead, go read it for yourself.
1) Process network traffic
2) Glue scriptland to the core
3) Provide infrastructure the core needs
4) Offer other features

1) Network traffic processing
Bro's main source of awesome comes from its traffic analysis capabilities. Similar to humans and pantlegs, Bro has to do this analysis packet by packet just like any other network sniffer. The code I found of interest in this bucket included the analyzers (duh)(which deserve their own blog post entirely), the NetSessions class and how it is used to interface packet processing. The Frag.* and Ressem.* files help out here as well. The *.pac files in the src/ directory tie binpac analyzers and Bro together. One thing that surprised me is that Bro can still do some neat things without processing any traffic. It can be called with a script and no interface. It can be invoked similar to other scripting language interpreters.

2) Scriptland
The interfaces Bro provides for scriptland to control how Bro behaves are extremely intriguing. I've never lifted the hood to see how scripting languages are implemented before this exercise. I'd be curious how other languages, like Ruby, do things. Scriptland functionality can be broken down into an additional three subtypes.
a) Interpreting the scripts
b) Script object abstraction
c) Exposing things the core can do for scriptland

a) Lexical Analysis
I had to do some reading to really understand what was going on here... Bro scripts need to be read by the core and interpreted before they take affect. Similar to how you are reading this right now, given a set of rules a computer can be taught to read source code. Written English's rules include: sentences read left to right, ideas stop at periods, words end with spaces, etc. Policy scripts don't have words or sentences but do have their own grammar. For examples, expressions make up statements, which are semi-colon delimited and make up frames, which are curly brace delimited. Bro's core needs to represent these things to be able to interpret them. Base classes like Expr, Stmt, Frame, and Scope offer structures which can be duplicated through class inheritance to provide different kinds of expressions or statements (if statement, print statement, for statement, etc.). The Location class provides line and column numbers for debugging interpreted scripts.

b) Object Abstraction
Variables and data in scriptland needs to be represented in C++ for the core to work with them. Variables and constants (scriptland native data types) are passed between policy scripts and Bro's core as pointers to Val objects. Scriptland defined types are created through the BroType interface. BroType creates BroObj (a base class with many children) objects. Event handlers are reference between the core and scriptland as EventHandlerPtr objects. The boolean value of an EventHandlerPtr indicates if scriptland references it anywhere. I'm assuming this is used for optimizing callbacks in the core. Modules in scriptland are C++ namespaces in the core, where GLOBAL is the global default namespace. Below is a rough mapping of scriptland things to C++ classes.

ScriptlandThe Core
pretty much everything                BroObj
all values Val
specific values     *Val (EnumVal, PortVal, AddrVal,
RecordVal, TableVal, etc.)
eventEvent, EventHandlerPtr
attributeAttr (&expire, &redef, &priority, etc.)

c) Bifs
Built in function (Bif) files declare and define functions that are available from scriptland but are implemented in C++. They are essentially C++ source files with a special escape syntax. Events, types, and constant variables used in scriptland are also declared and/or defined in Bifs.

3) Core Infrastructure
Some portions of the core don't process traffic, interpret scripts or provide cool features. Classes like Timers and Managers (*Mgr) are needed for managing Bro's complex internals but generally aren't immediately apparent to people who live in scriptland. Other things like BSD style getopts, signals, pipes, and custom queues are fundamental things for Bro being successful even though they are abstracted away.

4) Other features
This category is sort of a catch all. If it's not traffic processing, script interpreting, but is available to a Bro user or programmer, it's probably in this bucket. Some features in this bucket are implemented as plugins, like the input, probabilistic, logging and file_analysis frameworks. Some features are tied into the core of Bro (which doesn't make sense to me, but could just be legacy) like IP anonymization (Anon.*) and IP geolocation.

I think moving away from libgeoip's C API would be smart. It would remove a dependency and move the code out of the core and into a plugin. It may even provide an opportunity to upgrade from Maxmind's second version file type (mmdb). However, that is easier said than done. I'm still working on it... Maybe I'll figure it out before I get distracted and start playing with Binpac++.

Monday, September 22, 2014

Bro Plugins

A new paradigm has been floating around the development branches of Bro: plugins. Not broctl plugins but compiled plugins dynamically available to the Bro binary at run time. These plugins are similar to Apache modules. I fully support Bro plugins. They will hopefully reduce the size of the core. They will hopefully make creating a Bro package management solution much easier. That would make extending and customizing Bro much simpler.

I had at one point wanted a Levenshtein distance function available to me from scriptland. I ended up writing it into the core by extending the strings.bif file before compiling Bro. This provided me what I wanted, but was rather difficult to do. The new plugin architecture will allow me to remove this functionality from the core (I honestly think I'm the only person that has used this function) and distribute it as a plugin, similar to a Python module.

Gilbert Clark has a practical example of a plugin he wrote here. His plugin
measures overhead around plugin hooks within Bro's core. The code itself is quite confusing if you've never looked at all the custom C++ things Bro does. Luckily, the Bro team has posted a walk through for plugin development here. Let's walk through the steps together, making a few adjustments to the original documentation.

First, let's clone Bro from git (I like to do a clone each time I want to try something new with Bro to ensure I have the freshest of code) and compile Bro.

git clone --recursive
cd bro/

If something fails, try searching the Bro project page for guidance. Documentation is a little haphazard, so use some google foo. Next, change into the auxiliary plugins directory in the bro repository, make a new directory for your plugin and initialize the directory with the init-plugin helper script provided. You'll need to pick a C++ namespace and plugin name before initializing the directory.

cd aux/bro-aux/plugin-support
../init-plugin   # this prints usage


Next, code up the plugin's functionality. Again, if you're not familiar with C++ and the custom things Bro does in it, you'll need to read some of the *.cc and *.h files used to create Bro's core. The following is a modified version of what is shown here and goes in the my_plugin_name.bif file.

module CaesarCipher;

function rot13%(s: string%) : string
    char* rot13 = copy_string(s->CheckString());

    for ( char* p = rot13; *p; p++ )
        char b = islower(*p) ? 'a' : 'A';
        *p  = (*p - b + 13) % 26 + b;

    return new StringVal(new BroString(1, byte_vec(rot13), strlen(rot13)));

You should also provide a description for your plugin (not required) in the file.

nano src/my_plugin_name.bif
nano src/

Once you have your plugin call coded up nicely (you should still be in %BRO_DIR%/bro/aux/bro-aux/plugin-support/MY_PLUGIN_NAME) run configure with the bro directory. Then run make.

./configure --bro-dist=%BRO_DIR%

Bro's use of cmake should take care of all the details for you quite nicely. If you've copy+pasta'd my code (or written your own) correctly, then all should compile just fine. We need to now tell Bro where to find the plugin we just compiled and check that Bro can find and use the plugin.

export BRO_PLUGIN_PATH=$BRO_PLUGIN_PATH:%BRO_DIR%/bro/aux/bro-aux/plugin-support/MY_PLUGIN_NAME
%BRO_DIR%/build/src/bro -N
%BRO_DIR%/bro -e 'print CaesarCipher::rot13("Hello")'

At this point Bro is compiled and our plugin is compiled, but neither are installed. This can be done with:

sudo make install

Unset the environment variable we named BRO_PLUGIN_PATH and make sure we can still use our plugin.

%BRO_DIR%/bro -e 'print CaesarCipher::rot13("Hello")'

Hurray! To see where our plugin was installed to run:

ls /usr/local/bro/lib/bro/plugins/

And to distribute the plugin via source, from the %BRO_DIR%/bro/aux/bro-aux/plugin-support/MY_PLUGIN_NAME directory run:

make sdist
ls ./build/sdist

bro -NN lists all available plugins Bro knows about and their
namespaces. I'm not sure what the difference between plugins and bifs
are, but it seems they are different given that bifs are not listed in
bro -NN.

At some point, it might be wise if someone grep'd through Bro's source
and found all the functions NOT used within base and policy, removed
them from Bro's core, and placed them into plugins.

Bro Training and Lab Material

I recently presented at OpenDNS's S4 event where I spoke about using Bro. Linked below are the slides I presented and the lab material I created for this event. I encourage people to work through the labs to gain experience programming Bro and getting an understand for what all it can do.

OVA (with Bro installed and all labs ready to be worked on)
OVA MD5 (for ensuring you're OVA download is correct)

Sunday, March 30, 2014

Why a New Language?

I recently had a conversation with someone, I will describe as a security connoisseur, about Bro. He asked me what the benefits of Bro were given the ubiquity and age (insinuating some correlation to age and reliability) of Snort. This comparison often comes up in discussions when I mention Bro. I promptly pointed out that Bro is just as old as Snort and that Bro's DSL provides flexibility for stream analysis which Snort cannot provide. Stating that new programming languages are only adopted when they solve a new problem, I was asked, what purpose does the Bro language solve? My immediate answer was native network data types. This hastily provided answer was incomplete and didn't convince him. The connoisseur responded that native network data types are only a matter of mathematical conversions and aren't anything special.

I've been rolling the question around in my head for a bit now. What new problem does the Bro language solve and why should it be used over an existing language? First off, it's a domain specific language and is not intended to be used as a general purpose language, like Python. Secondly, I wouldn't say the language solves a new problem. The language (and the platform it is tied to) solves an old problem, that of determining what is occurring on a network, in a new way.
The Bro language provides these solutions:
  • Both packet analysis and connection analysis is available
  • The ability to analyze high performance links with out needing to know C/C+
  • Native network data types (I still consider this a benefit)
    • if ( in
    • local http: port = 80/tcp
  • a single interface to the otherwise complex tasks of reassembling streams, parsing protocol fields, defining policies and acting on those policies
    • Opposed to gluing different tools (tcpdump, Snort, Razorback, pynids, etc) together
Hopefully this post provides a more verbose answer than simply "native network data types".