Sunday, October 5, 2014

Out of the Scripts and Into the Core

In the previous post I discussed plugins in Bro and how they could be used to move code from the core to self contained directories. After talking with Seth, I realized plugins had been around in Bro for a while (longer than I thought). I had no idea as most of my experience with Bro was in scriptland. The input framework was built with Bro's plugin system and that's been around for a bit.
Seth suggested I try to move the geoIP lookup features of Bro into a plugin. I immediate thought, with my great new understanding of plugins, it would be quite simple. The geoIP lookup code relies on libgeoip and the legacy MaxMind C API. It exposes a few additional built in functions in bro.bif and makes use of those functions in a few base/policy scripts. I thought it would be easy, involving a few copy/paste commands and moving some script files around. It turns out the job requires knowing some C++ and being more familiar with Bro's internals than I initially thought.
I can read C++ and write some very basic "Hello World" type programs; that is the extent of my C++ knowledge. I also had never really dug deep into Bro's source before. I did do this. However, looking back that was relatively trivial. After grep'ing through much of the src/ directory, I decided I should document some key finding of the core before attempting to modify it. The following is meant to be rough notes and references on C++ and the source which comprises the core.

Generic C++ Programming Concepts
C++ has been around for a while. These concepts aren't anything new to many and might seem very rudimentary to some. I approach Bro from the perspective of someone with relatively strong systems, networking, and security experience. I would say my understanding of programming, especially in C++, isn't as refined.
- C++ is a statically typed compiled language. It can be thought of an extended version of C.
- Things need to be declared before they are defined. This is fairly common across languages I'm used to.
- All things have a scope. Global vs local scope is important and influences references. Local takes precedence.
- Everything has a namespace. Namespaces are a mechanism for grouping things so that we don't have to dump everything into a global scope. Python has namespaces in packages (e.g. the match function in re) and Bro has namespaces as modules. In fact modules in scriptland translate to C++ namespaces.
- C++ is object oriented, which means classes. Classes are data type definitions. A thing of a particular class is called an object. This isn't specific to C++. Classes have member functions, in Python a string class has a function of split(). Some are class variables and methods are public and some are private, which determines how the variable or method can be accessed.
- C++ supports (multiple) class inheritance. Classes can inherit structure from base classes. An example of this is a class that describes a bluebird. The bluebird class inherits attributes and structure of the bird class and, in turn, the bird class inherits attributes and structure of the animal class. Class inheritance provides a generalized way of writing classes that maximizes code reuse.
The Core
Much of Bro's source consists of class definitions and by reading (well, OK, skimming) through the source (well, OK, the header) files, I came up with four buckets most of the code in Bro's core does. These aren't meant to be comprehensive. I didn't write the core so don't take my word for it. Instead, go read it for yourself.
1) Process network traffic
2) Glue scriptland to the core
3) Provide infrastructure the core needs
4) Offer other features

1) Network traffic processing
Bro's main source of awesome comes from its traffic analysis capabilities. Similar to humans and pantlegs, Bro has to do this analysis packet by packet just like any other network sniffer. The code I found of interest in this bucket included the analyzers (duh)(which deserve their own blog post entirely), the NetSessions class and how it is used to interface packet processing. The Frag.* and Ressem.* files help out here as well. The *.pac files in the src/ directory tie binpac analyzers and Bro together. One thing that surprised me is that Bro can still do some neat things without processing any traffic. It can be called with a script and no interface. It can be invoked similar to other scripting language interpreters.

2) Scriptland
The interfaces Bro provides for scriptland to control how Bro behaves are extremely intriguing. I've never lifted the hood to see how scripting languages are implemented before this exercise. I'd be curious how other languages, like Ruby, do things. Scriptland functionality can be broken down into an additional three subtypes.
a) Interpreting the scripts
b) Script object abstraction
c) Exposing things the core can do for scriptland

a) Lexical Analysis
I had to do some reading to really understand what was going on here... Bro scripts need to be read by the core and interpreted before they take affect. Similar to how you are reading this right now, given a set of rules a computer can be taught to read source code. Written English's rules include: sentences read left to right, ideas stop at periods, words end with spaces, etc. Policy scripts don't have words or sentences but do have their own grammar. For examples, expressions make up statements, which are semi-colon delimited and make up frames, which are curly brace delimited. Bro's core needs to represent these things to be able to interpret them. Base classes like Expr, Stmt, Frame, and Scope offer structures which can be duplicated through class inheritance to provide different kinds of expressions or statements (if statement, print statement, for statement, etc.). The Location class provides line and column numbers for debugging interpreted scripts.

b) Object Abstraction
Variables and data in scriptland needs to be represented in C++ for the core to work with them. Variables and constants (scriptland native data types) are passed between policy scripts and Bro's core as pointers to Val objects. Scriptland defined types are created through the BroType interface. BroType creates BroObj (a base class with many children) objects. Event handlers are reference between the core and scriptland as EventHandlerPtr objects. The boolean value of an EventHandlerPtr indicates if scriptland references it anywhere. I'm assuming this is used for optimizing callbacks in the core. Modules in scriptland are C++ namespaces in the core, where GLOBAL is the global default namespace. Below is a rough mapping of scriptland things to C++ classes.

ScriptlandThe Core
everythingSerialObj
pretty much everything                BroObj
all values Val
specific values     *Val (EnumVal, PortVal, AddrVal,
RecordVal, TableVal, etc.)
functionFunc
eventEvent, EventHandlerPtr
attributeAttr (&expire, &redef, &priority, etc.)
connectionConnection
uidUID
typeBroType
whenTrigger
ReporterReporter

c) Bifs
Built in function (Bif) files declare and define functions that are available from scriptland but are implemented in C++. They are essentially C++ source files with a special escape syntax. Events, types, and constant variables used in scriptland are also declared and/or defined in Bifs.

3) Core Infrastructure
Some portions of the core don't process traffic, interpret scripts or provide cool features. Classes like Timers and Managers (*Mgr) are needed for managing Bro's complex internals but generally aren't immediately apparent to people who live in scriptland. Other things like BSD style getopts, signals, pipes, and custom queues are fundamental things for Bro being successful even though they are abstracted away.

4) Other features
This category is sort of a catch all. If it's not traffic processing, script interpreting, but is available to a Bro user or programmer, it's probably in this bucket. Some features in this bucket are implemented as plugins, like the input, probabilistic, logging and file_analysis frameworks. Some features are tied into the core of Bro (which doesn't make sense to me, but could just be legacy) like IP anonymization (Anon.*) and IP geolocation.

I think moving away from libgeoip's C API would be smart. It would remove a dependency and move the code out of the core and into a plugin. It may even provide an opportunity to upgrade from Maxmind's second version file type (mmdb). However, that is easier said than done. I'm still working on it... Maybe I'll figure it out before I get distracted and start playing with Binpac++.

Monday, September 22, 2014

Bro Plugins

A new paradigm has been floating around the development branches of Bro: plugins. Not broctl plugins but compiled plugins dynamically available to the Bro binary at run time. These plugins are similar to Apache modules. I fully support Bro plugins. They will hopefully reduce the size of the core. They will hopefully make creating a Bro package management solution much easier. That would make extending and customizing Bro much simpler.

I had at one point wanted a Levenshtein distance function available to me from scriptland. I ended up writing it into the core by extending the strings.bif file before compiling Bro. This provided me what I wanted, but was rather difficult to do. The new plugin architecture will allow me to remove this functionality from the core (I honestly think I'm the only person that has used this function) and distribute it as a plugin, similar to a Python module.

Gilbert Clark has a practical example of a plugin he wrote here. His plugin
measures overhead around plugin hooks within Bro's core. The code itself is quite confusing if you've never looked at all the custom C++ things Bro does. Luckily, the Bro team has posted a walk through for plugin development here. Let's walk through the steps together, making a few adjustments to the original documentation.

First, let's clone Bro from git (I like to do a clone each time I want to try something new with Bro to ensure I have the freshest of code) and compile Bro.

git clone --recursive https://github.com/bro/bro.git
cd bro/
./configure
make


If something fails, try searching the Bro project page for guidance. Documentation is a little haphazard, so use some google foo. Next, change into the auxiliary plugins directory in the bro repository, make a new directory for your plugin and initialize the directory with the init-plugin helper script provided. You'll need to pick a C++ namespace and plugin name before initializing the directory.

cd aux/bro-aux/plugin-support
mkdir MY_PLUGIN
cd MY_PLUGIN
../init-plugin   # this prints usage

../init-plugin MY_Cpp_NAMESPACE MY_PLUGIN_NAME

Next, code up the plugin's functionality. Again, if you're not familiar with C++ and the custom things Bro does in it, you'll need to read some of the *.cc and *.h files used to create Bro's core. The following is a modified version of what is shown here and goes in the my_plugin_name.bif file.

module CaesarCipher;

function rot13%(s: string%) : string
    %{
    char* rot13 = copy_string(s->CheckString());

    for ( char* p = rot13; *p; p++ )
        {
        char b = islower(*p) ? 'a' : 'A';
        *p  = (*p - b + 13) % 26 + b;
        }

    return new StringVal(new BroString(1, byte_vec(rot13), strlen(rot13)));
    %}


You should also provide a description for your plugin (not required) in the Plugins.cc file.

nano src/my_plugin_name.bif
nano src/Plugin.cc

Once you have your plugin call coded up nicely (you should still be in %BRO_DIR%/bro/aux/bro-aux/plugin-support/MY_PLUGIN_NAME) run configure with the bro directory. Then run make.

./configure --bro-dist=%BRO_DIR%
make


Bro's use of cmake should take care of all the details for you quite nicely. If you've copy+pasta'd my code (or written your own) correctly, then all should compile just fine. We need to now tell Bro where to find the plugin we just compiled and check that Bro can find and use the plugin.

export BRO_PLUGIN_PATH=$BRO_PLUGIN_PATH:%BRO_DIR%/bro/aux/bro-aux/plugin-support/MY_PLUGIN_NAME
%BRO_DIR%/build/src/bro -N
%BRO_DIR%/bro -e 'print CaesarCipher::rot13("Hello")'


At this point Bro is compiled and our plugin is compiled, but neither are installed. This can be done with:

sudo make install

Unset the environment variable we named BRO_PLUGIN_PATH and make sure we can still use our plugin.

unset BRO_PLUGIN_PATH
%BRO_DIR%/bro -e 'print CaesarCipher::rot13("Hello")'

Hurray! To see where our plugin was installed to run:

ls /usr/local/bro/lib/bro/plugins/

And to distribute the plugin via source, from the %BRO_DIR%/bro/aux/bro-aux/plugin-support/MY_PLUGIN_NAME directory run:

make sdist
ls ./build/sdist


bro -NN lists all available plugins Bro knows about and their
namespaces. I'm not sure what the difference between plugins and bifs
are, but it seems they are different given that bifs are not listed in
bro -NN.

At some point, it might be wise if someone grep'd through Bro's source
and found all the functions NOT used within base and policy, removed
them from Bro's core, and placed them into plugins.

Bro Training and Lab Material

I recently presented at OpenDNS's S4 event where I spoke about using Bro. Linked below are the slides I presented and the lab material I created for this event. I encourage people to work through the labs to gain experience programming Bro and getting an understand for what all it can do.

Slides
OVA (with Bro installed and all labs ready to be worked on)
OVA MD5 (for ensuring you're OVA download is correct)

Sunday, March 30, 2014

Why a New Language?

I recently had a conversation with someone, I will describe as a security connoisseur, about Bro. He asked me what the benefits of Bro were given the ubiquity and age (insinuating some correlation to age and reliability) of Snort. This comparison often comes up in discussions when I mention Bro. I promptly pointed out that Bro is just as old as Snort and that Bro's DSL provides flexibility for stream analysis which Snort cannot provide. Stating that new programming languages are only adopted when they solve a new problem, I was asked, what purpose does the Bro language solve? My immediate answer was native network data types. This hastily provided answer was incomplete and didn't convince him. The connoisseur responded that native network data types are only a matter of mathematical conversions and aren't anything special.

I've been rolling the question around in my head for a bit now. What new problem does the Bro language solve and why should it be used over an existing language? First off, it's a domain specific language and is not intended to be used as a general purpose language, like Python. Secondly, I wouldn't say the language solves a new problem. The language (and the platform it is tied to) solves an old problem, that of determining what is occurring on a network, in a new way.
The Bro language provides these solutions:
  • Both packet analysis and connection analysis is available
  • The ability to analyze high performance links with out needing to know C/C+
  • Native network data types (I still consider this a benefit)
    • if (192.168.1.1 in 192.168.1.0/24)
    • local http: port = 80/tcp
  • a single interface to the otherwise complex tasks of reassembling streams, parsing protocol fields, defining policies and acting on those policies
    • Opposed to gluing different tools (tcpdump, Snort, Razorback, pynids, etc) together
Hopefully this post provides a more verbose answer than simply "native network data types".

Sunday, March 9, 2014

Learning the Hard Way


I learn programming in two ways. I learn what works and try to repeat it. I learn what breaks and try to avoid it. If you want to learn how to do something, learn all ways to not do that something.
This may be obvious, but learning to write Bro means learning another programming language. The Bro scripting language is somewhere in between C++ and Python and, like any language, comes with its own pitfalls, quarks and learning curve.
Here is a list of gotcha's I've run across while learning to write Bro scripts.

The signature framework shouldn't be used the way you might think.
When I first saw Bro had a signature framework I thought I'd dump Emerging Threats Snort rules into it somehow and everything would be great. That's not the case. While Bro's signature framework is powerful, Bro isn't Snort. Bro uses signatures, similar to Snort, to identify something specific. A possible cleartext shell, a possible SSN, or possible HTTP traffic are all examples. Bro doesn't just alert when a signature matches though. Bro can pass information about the connection which matched to script land. For example, to detect protocols on non-standard ports, Bro uses signatures to analyze and tag connections. Those tags indicate to other scripts how to handle the connection.

Don't change the provided scripts.
This one is pretty simple. The Bro team provides a collection of scripts with the project. Don't change them in place. Most values and variables in those scripts can be redefined in other places. Instead, write your own scripts and load them into Bro via %PREFIX%/share/bro/local.bro or specifying them when invoking Bro via CLI. Installing Bro will overwrite the built in scripts.

For loops are random access.
For loops in Bro scripts are more like iterators in Python oposed to C style for loops. Forget about counters. It is possible to hack up counters in for loops, but problems usually can be solved without them. For loops, like the majority of things in Bro, don't occur in order. Which leads into the next quark.

Everything in scriptland is asynchronous.
This is a blessing, as long as you understand the paradigm. Events in scriptland don't happen in the order they appear in the file (unlike a procedural BASH script, for example). The same event can occur in multiple different Bro scripts, which get loaded in different ways. All the event with the same name are handled at the same time (ignoring priority). This is really powerful if you think about it. On a saturated link Bro can't wait for some event handling body to finish; it has more packets to process.
Beware: the input framework is asynchronous too. Bro can't wait for the input framework to finish reading a file from disk, it has packets to process (even if those packets are in a tiny pcap). This can lead to the packet processing finishing before some input processing.

Most of the documentation is in the source.
Its diffcult to write solid C++ and Python, let alone solid documentation about solid C++ and Python. This is slowly changing and more documentation is slowly being released. I found the Bro Exchange and Bro Workshop videos extremely helpful. To learn Bro scriting concepts learn to navigate the source (and learn your grep flags).

Connection IDs are mostly forever unique.
Running Bro on a trace file generates nice log files with connections index by connection unique identifiers. Running Bro twice on one trace file will generate different connection UIDs. This makes working back from logs to pcaps a bit challenging.

Protocol analyzers are a challenge to write.
Perhaps it is because I'm not a seasoned C++ developer, but creating new protocol analyzers is difficult. The current analyzers are, however, quite robust. Just beware when your traffic is an edge case to the protocol spec. I've run into a couple of instances where a few HTTP streams used a combination of x0d x0a Bro could not identify. The connection didn't end up in the HTTP log file as I expected.
Raw packet contents are exposed to scriptland through events such as packet_contents. However, scriptland protocol analyzers are processing expensive.

Scale.
Most of the challenges listed above can make Bro scripting difficult to navigate at times. However, they ultimately exist because Bro is built to scale.  Any state a script keeps in scriptland has potential to be harmful as network traffic is analyzed. Beware global variables and not expiring data.
Beware the order of events and processing.

If scripts are written by programmers ignorant to potential pitfalls, then unexpected failures can occur. Hopefully this list will
assist script developers to further understand Bro and write bug free scalable scripts.

Monday, February 3, 2014

Conditional Intelligence

Intel Framework Background

Bro's intelligence framework allows programmers to build tables of indicators for Bro to use while monitoring a network. Indicators come in different types. For example, Intel::DOMAIN, Intel::SOFTWARE, and Intel::EMAIL are all indicator types. Given an indicator with a type of Intel::DOMAIN, Bro will monitor for occurrences of that domain in all locations a domain could be. Bro's intel framework also allows an indicator to have a location specified with it. For example, Bro is able to differentiate between 'evil.com' within an HTTP host header or 'evil.com' within a DNS qname. Likewise, an Intel::SOFTWARE indicator can monitor for an HTTP user agent value, SMTP user agent value, or a server response version value.

Bro is able to do these types of tasks easily because of how Bro handles streams. Bro parses protocols it recognizes and creates objects for each connection (an originating and responding stream) as the connection occurs. Protocols have predefined (but redefinable) structures associated with them (Bro uses the internal record data type for this). For examples of what indentifiers are witihin a connection, try printing a connection object to STDOUT from a connection_state_remove event. A connection object is not only a container for other protocol objects, but additional fields used internally by different Bro scripts. For example, look at how the c$dns$ready value is used within %PREFIX%/bro/share/bro/base/protocols/dns/main.bro, which handles creating parts of the DNS object within connection objects.

The intel framework essentially allows for the monitoring of indicators (string values) within a selection of object identifiers, aka fields. This is very powerful. A large amount of object identifiers are provided by Bro out of the box. Additionally, object are redefinable by programmers and fields/identifiers witihin them don't need to be extracted from values within the connection; they can be derived from other any other source within scriptland (any other identifier/field added to protocol records with redef statements).


Conditional Intelligence (aka rules)

I've been playing around with Bro's intelligence framework the past month or so and have created a prototype extension for it that allows conditional rules to be built on top of indicators. Inspired in part by Yara's combination of strings and conditions and IOC's search capability, this extension's goal is to allow for complex situations to be recognized by Bro without the need to write and rewrite different event handlers. For example, consider HTTP connections to 'www.google.com' with a user agent of 'Wget/1.13.4 (linux-gnu)', a URI path of '/foo' or '/bar', and a return status code of 404. Generating this type of connection is easily done by running

wget www.google.com/foo

from the command line. Identifying this within a Bro script is easily done with event pseudo code such as:

event connection_state_remove(c: connection)
{
    if (! c?$http) return;
    if ( (c$http$host == "www.google.com") &&
         (/^\/foo/ in c$http$uri || /^\/bar/ in c$http$uri) &&
         (c$http$user_agent == "Wget/1.13.4 (linux-gnu)") &&
         (c$http$status == 404)
       )
    {
        print "found a connection";
    }
}

If one day Google decides to change their server's HTTP response status code from a 404 to a 401, the above code needs to be modified or a new event handler has to be written. Maintaining a list of rules and indicators would be much easier than maintaining sets of event handling scripts. The above example using Google and wget is already built into the Rules extension's indicator and rules files found here. Rules can consist of: only indicators, nested rules (yo dawg, rules of rules), or mixed rules (rules of indicators and rules). By building a framework where indicators and rules can be reused and recombined, the amount of script writing needed to identify when something very specific happens can be reduced.

Currently, the largest limitation of the extension is scalability. Unique identifies for indicators were needed for rules to reference. This was accomplished by expanding the available metadata fields associated with indicators (provided by Bro's intel framework). Unfortunately, Bro does not distribute metadata to all worker nodes (for good reasons). This leaves the Rules extension for standalone instances only. Additionally, Bro's intel framework only supports string indicators. I have yet to determine if no having regular expressions is a hindrance or a blessing. I'll keep everyone posted :)

Friday, January 31, 2014

A Look at Applied NSM

After my interview with the authors of Applied Network Security Monitoring, lead author Chris Sanders was kind enough to provide me with a preview release of the book. If you haven't pick up a copy yet, I strongly recommend you do.

Applied NSM provides readers with a process for building a production-ready network monitoring system from the ground up. From building sensors, to tuning them, and ultimately to answering, "what do I do with this data and these alarms?", Sanders et al. have laid out a solid foundation for analysts entering the network security profession. Personally, I found the sections on Bro (duh) and open source intelligence the most beneficial. I felt Bianco's practical example of using Bro to monitor a  netblock's unused IP space was a fantastic way to learn the basics of Bro scripting.

Not only is the book great reference material, all the proceeds go to charity, which is quite outstanding. Great job, guys!