Friday, August 9, 2013

Extending Bro's Core

If you've made it this far. Congrats, you now know more about Bro than almost everyone, ever. Bro is a great framework and has been going through tremendous growth in the past few years. New modules, new functionality, new features are being built into Bro every time I hear about it. If you use Bro enough you may eventually find that Bro doesn't have everything you need. Luckily, Bro is open source so you can add functionality to it yourself.

The Levenshtein distance algorithm is often used by spell checkers to determine if a word is a misspelling of another word in a dictionary. The algorithm takes two strings as input and calculates how different the strings are by counting the number of insertions, deletions and additions it takes to transform one string to the other string. Let's see how one would go about implementing this as a built in function for use in scriptland in Bro's C++ core.

Hopefully you still have the directory you created from cloning the Bro git repo from Building and Running Bro. If not reclone it with the following command.
    git clone --recursive git://git.bro-ids.org/bro

Open the file that defines built in functions that handle strings and have a look around.
    vi bro/src/strings.bif
This file contains specially crafted C++ that gets parsed and compiled into Bro when you run those ./configure, make, make install commands. Every string related bif available in scriptland is defined in this file. I essentially stumbled my way through this file as I don't truly know C++ or the extensions the Bro project has built into the core's C++ code.

The function here can be added to the src/strings.bif file and compiled into Bro. The way I came up with this new functionality was to
  1. Find a basic function and see how it works, I saw is_ascii as a good function to copy from
  2. Rip out what the function does and add your own logic. The core C++ requires you to use Bro defined functions such as Len() and Bytes()
  3. Be sure to wrap whatever you are returning to scriptland in a Val() or StringVal()
  4. Compile your added function into Bro while praying you did everything correctly
 Once Bro finishes recompiling, you should be able to call levenshtein_distance(s1: string, s2: string) from scriptland and receive a correct distance metric.

Intro to Brogramming - Frameworks

I have to be honest. I had to think for bit what the difference between a Bro module and a Bro framework was. I could be completely off base, but here's what I've concluded.

Frameworks in Bro provide global extensions to how Bro operates and what Bro can do. Frameworks can be thought of as shared libraries between parts of Bro. Frameworks are built on top of one or multiple modules. I like to think about modules in the context of what they expose to other scripts (they're exports) while I consider frameworks as something that change how Bro can be used from a high level/global perspective.

For example, the LogAscii, LogSQLite, and LogElasticSearch modules expose ways to log information Bro generates in different output formats. Together they help make up the logging framework which offers ways to log information in different ways to all of Bro. Bro's homepage has some frameworks listed. I have yet to write my own full featured framework and thus I cannot give an example of one. I again suggest turning to and reading the code. See the /usr/local/bro/share/bro/base/frameworks directory.

Bro frameworks include
  • Input - this framework provides a mechanism for Bro to read data (including Bro logs) into Bro for processing.
  • Intel - this framework provides a mechanism for Bro to monitor the entire network for a specific piece of intelligence. For example, a malicious domain could appear in a link within an email or chat, in an HTTP URL, or within a DNS request.
  • File Analysis - this framework allows for files seen on the network to be treated similarly to connections. Files can be carved from the wire, examined, hashed, or submitted to Team Cymru's Malware Hash Registry.
  • Logging - the logging framework supports other frameworks and modules by providing a mechanism to easily create a new log stream, filter an existing log stream, or define custom events that should occur when some piece of information is ready to be logged.
  • DPD (dynamic protocol detection) - DPD is very cool and probably the single reason I started to look at Bro. Bro uses a combination of standard ports, behavioral analysis and signatures to determine the protocol being used on the wire. If you've ever had to explicitly tell Wireshark which decoder to use, you have experienced the frustrations IRC running over port 8080 can create.
  • Notice - this framework provides a way for other Bro frameworks and modules to raise notices. Raised notices in scriptland are similar to raised events from the core.
  • Cluster - Networks get big and sometimes a single machine cannot handle all the traffic you may have to inspect. The cluster framework provides a mechanism for Bro to scale horizontally. Another nice thing about Bro is that it runs on commodity hardware. If a stand alone (a single Bro process on a single system. We've been running Bro in stand alone mode for these blog posts) can't handle all your packets, just through more cheap boxes at it.
  • Signatures - the signature framework provides a Snort style, packet by packet inspection mechanism. Signatures are mostly used by the DPD framework. As Bro focuses more on connections and stream than packets, signatures don't get used very often by brogrammers.

Intro to Brogramming - Modules

Bro modules are essentially C++ namespaces. Modules allow grouping of related functions, variables, and events under a single context.

The GLOBAL module is the default namespace for all Bro scripts. If no module is defined in a Bro script, it is dropped into GLOBAL. Modules are defined whenever a brogrammer is extending Bro at the scripting layer to do something beyond what Bro currently does. Modules are basic building blocks for extending Bro via scriptland. Examples of modules include the LogAscii, LogDataSeries, and LogSQLite. Each offers a different interface for logging information Bro generates.

Often times Bro modules are organized in the source tree in a somewhat logical manner and are given their own directory. A very basic module can be found here. To learn Bro, you have to read and understand its code. I highly recommend spending time exploring the /usr/local/bro/share/bro directory and files within it especially the base directory (which contains important scriptland modules and scripts) and the policy directory (which contains tweaks to the base scripts).

A module usually consists of the following
  • main.bro - The main script that defines the module and what it does.
  • __load__.bro - A special file Bro uses to to load all required dependency scripts for a module. Bro looks for this file in the working directory by default and loads it if the file is present.
  • helper scripts or data files - Other Bro scripts or data files that might be too long or specific to put into the main.bro file. Bro modules often work by chaining short script files (that do something specific) together.
From within a script file, the 'module' keyword registers a new module with Bro's core and the 'export' block defines global changes Bro should make to accommodate the new module being defined. Usually, the code in the rest of a module script is used to set values in the module's record (often for logging) or to accomplish other tasks the module intends to provide.

To find the names of all modules in your version of Bro, try running the following command in the directory %BRO_PREFIX%/share/bro
grep -R '^module' ./* | cut -f2 -d':' | sort | uniq

Intro to Brogramming - Events

Bro is event driven. When something happens on the network, Bro's core will raise an event and execute all the code blocks from scriptland associated with that event. Just as Bro has built in functions available for brogrammers to use, Bro has built in events. These built in events cover most of the things a network operator would want to know about (again, the magic of the BSD license allows you to extend Bro as you see fit).

I'm not going to write any code examples for built in events as Liam Randall already has done an excellent job of doing that. His Github page has Bro scripts that keep tallies of the number of events that fire while Bro runs. Read through the scripts and try to understand them. They are repetitive as they do the same thing for each event.

In reality, Laim's fire scripts are great for understanding how Bro works, but are too simple to give any great insight into what is happening on the network. Pick an event he included from a file, an easy one you can easily generate (like dns_request) and replace his code block with your own. Here's an example something I want Bro to do when it sees a DNS request.

event dns_request(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count)
{
    print fmt("DUDE! I just saw some bro query the DNS for %s with a query type of %d and a query class of %d", query, qtype, qclass);
}
Try running this script and writing your own. Add some local variables and calls to built in functions to the event.

Intro to Brogramming - Built in Functions

Defining functions within a script is good and useful, but Bro has very useful functions already built in to C++ core for brogrammers to call from scriptland. These functions, available to Brogrammers from scriptland are called built-in functions.

Being an open source project Bro get contributions left and right from people who extend its functionality. The file located at
/usr/local/bro/share/bro/base/bro.bif.bro has a list of all the functions available for general programming in Bro. /usr/local/bro/share/bro/base/strings.bro.bif has a list of all the string related functions Bro has built in. In fact, you should familiarize yourself with the every .bif (built in function) file in that directory. Alternatively, you can browse the auto-generated documentation those files create by following the hyperlinks.

Bro has functions for doing math, string manipulation, address manipulation, file handling, type conversions, events, all sorts of stuff. Functions and events can be decorated with attributes too. The &priority attribute can be used to set an event's priority in the core's event handling queue. When an event occurs, the core of Bro collects all blocks of code associated with that event and executes them (you could have eight scripts loaded into the running Bro process and all eight scripts could have code blocks associated with a single event). Setting a priority on an event from one of those eight scripts ensures the code block is executed by the core with a higher priority.

Run this script and find the bif's it used in the built in function files (or their auto generated documentation equivalents). Try to find at least five new functions you think sound interesting or useful. Pay attention to the parameters' types the functions require and the type returned by the functions.

Intro to Brogramming - Logic, Flow Control and Functions

What would a programming language be without conditional statements and loops? Pretty, lame. Lucky for you, Bro isn't lame.

Code within Bro script files can be within functions, events (both of which we'll get to later), or neither. This is similar to code in a Perl/Ruby/Python script can be nicely placed within a subroutine/function or just written haphazardly in the file.
Code placed in functions and events is not run until the function gets called or the event is triggered. Code that is placed in a script but in neither a function nor an event is executed sequentially. To allow the brogrammer to control the flow of a script, scriptland has the following commands
  • if, else if, else  - Code following these statements execute given the defined condition is met.
  • ternary - Ternary statements are shorthand if statements and behave the same way.
  • when - Code blocks following these statements execute asynchronously. The code blocks are queued and are executed when the condition is met, indipendent of sequential execution happening in the rest of the script.
  • schedule - Schedule a block of code to execute after a user defined duration of time.
  • for - In Bro, for loops are used to iterate over something - a vector, table or set. For loops do NOT guarantee order of the structure they iterate over (random access). For loops require a local variable to be  assignment the current value of the complex data structure being iterated over.
  • break - A command to end execution of the nearest loop.
  • next - A command that begins the next iteration of the nearest loop, skipping any commands after the next.
  • return - Usually the last command in a function, it returns a value from a block of code to its caller

Run this script and be sure you understand everything that is happening internally.

This is what the script does
#scheduling and when
  • creates a user defined event 'e' that prints the count value, 'c' passed to it and sets the global count variable equal to 'c'
  • schedules the 'e' event to run 10 seconds after Bro initializes
  • schedules the 'e' event to run 15 seconds after Bro initializes
  • Asynchronously queues a print statement to execute when the global variable 'x' is equal to ten (this should occur 15 seconds after Bro initializes)
#conditionals
  • Define two boolean values, one to true and one to false
  • Test if 'b2' is true (it's not) and if so print something
  • Test if 'b2' is true (it's not) and if so print something
  • If the previous two tests both failed (which they did) print something
  • Declare an undefined string variable
  • Use the ternary operator to 
    • If 'b1' is true set 's' equal to "b1 is true"
    • Otherwise set 's' equal to "b1 was false"
  • print the value of 's' to STDOUT
# loops
  • Define a set of strings name 'ss'
  • Loop over each string within 'ss', setting 's' to one of the values
  • Print 's' to STDOUT
Chances are you see print output from the conditionals section of code before you see the output from the scheduled section of code. This is because Bro schedules the code to run in the schedules section and continues executing the script sequentially. That is the meaning of asynchronous.

Intro to Brogramming - Complex Data Types

The previous section covered variable conventions and basic atomic data structures in Bro. This section continues to build on the previous section
by introducing complex data structures. These complex types are composed of atomic types and behave differently. Below is a list of the complex data structures Bro has available for use.
  • enumerable - Enumerables are collections of dissimilar things and are used to bypass strong typing. Enumerable are strange, I think of them as sets of things with no type.
  • records - Records are collections of related things. records can be thought of as a single row within a table or a C-like structure. Each variable in a record has a type. Records are very important in Bro. Logging is done with records. Other data structures, such as the connection type, are built using records.
  • sets - Sets are two dimensional arrays of a single atomic type. Think of a set as an array of things that share a type. A set of strings could list user-agents.
  • tables - Tables are just associative arrays. Tables have keys which map to values.  Tables in Bro are similar to hashes in Perl/Ruby and dictionaries in Python. In Perl a common data structure is a hash of hashes, in Bro it is very possible to have a table of tables.
  • vectors - Vectors are tables which are always indexed by a count. Vectors may be on their way out of the Bro code (I heard something about that). Try not to get too attached to vectors.
  • functions - Functions are named blocks of code, surrounded by curly braces, that can be reused. Functions sometimes return a value and always require a type. If a function doesn't return a value it must a a type of void.
  • events - Events are raised by the C++ event engine within Bro. Code within event blocks is executed when Bro's core raises that event. Events often get passed parameters from the event engine to use within the user defined code block. For example, when the event engine raises a DNS request event it passes scriptland information about that request, like the query string, the IP address making the request, the IP address the request is being sent to, and a bunch of other information.
  • file - A file handle. Bro can write to files, and often does within the logging framwork, but the Input framework should be used to read from files. (We'll get to frameworks later on).
Other variable types exist within the Bro source, however they are built on combinations of simple and complex data types presented in this and the previous section and are mostly syntacitc sugar. The connection type I mentioned earlier is one such type. It is easy to image how these types can be combined to create many different data structures. For example, a database table could be represented by a table of records.

Because Bro deals with network traffic, which has a high volatility and often a high volume, data goes stale quickly and Bro needs a way to deal with this.
Attribute decorations are used to give a variable a special property, such as when the variable expires (variable timers), or if the variable is required to be defined (e.g. within a record). The Bro website details all the built in data types in great length, so I won't. It can be found here.

Run and read this script and be sure you understand what is going on.

Within the script the following occurs
  • A new variable type is built using a record type. 
  • Values are assigned to two instantiations of the NewType and printed to STDOUT. 
  • Sets of ports and strings are created, values are added and removed from them. 
  • The size of the set is calculated and printed to STOUT. 
  • A port table is created, accessed, and printed to STDOUT using a for loop (we'll get to control flow later).
    Run the script a few times and pay attention to the order the table values are printed to STDOUT each time. For loops in Bro are random access. Each value will get accessed once within the loop, but their order is NOT guaranteed. This is rather different from languages I was used to and its bitten me more than once. Be aware of the random access for loop.

Intro to Brogramming - Variables and Simple Data Types

The wordplay titles stop here. Sorry to disappoint.

Brolang is strictly typed which means a variable must be assigned a type and can only act in a specific way.  If you've written C code, you're used to this. If you come from the lazy land of Perl, you should expect your variables to rely more on how you define them.
Bro has been built to accommodate network programming which makes it very good at doing just that. Some of these accommodations can be seen when looking at the native data types Bro has to offer a brogrammer. For example, having variables of type addr is rather convenient when working with network data. MySQL users know all too well the pains of not having native IP address data types.
Bro includes the following atomic data types:
  • addr – an addr can contain a network address. IPv4 and IPv6 addresses can be held within an addr type.
  • subnet – a CIDR notation subnet (10.0.0.0/8). Even though a subnet is a collection of addresses, Bro still treats subnets as atomic data types.
  • bool – a bool can contain only two values, True or False. This data type behaves similarly to boolean variables in other languages. An example usage of this variable could be to determine if a condition is met or not.
  • count – a variable of type count contains an unsigned 64 integer (positive number). Bro has potential for looking at huge amounts of data off the wire. Sometimes you want to count things (you can’t have a negative number of SSH login attempts).
  • double – a variable of type double hold values of double-precision decimal numbers. Bro has the ability to do math for you and a situation could arise where simple integer math does not fulfill needs.
  • int – similar to counts but signed (can be positive or negative). An example usage of an integer is determining the change in number of connections seen between this hour and a previous hour.
  • interval – a range of time (3 sec/min/hr/day[s]). An interval can be used to measure time relative from something else. You may want to instruct Bro to set a timeout for something or to wait a specific amount of time after starting up to execute a function.
  • pattern – a regular express. Regular expression are fantastically powerful. Bro's dynamic protocol detection uses regular expressions to identify protocols on the wire instead of relying solely on standard port numbers.
  • port – a port number and associated protocol (they take the format of 53/udp or 80/tcp). What use is a network programming language without addresses and ports?
  • string – a string of bytes. Strings occur all over the place in Bro. Strings could be domain names, URLs, or even email messages.
  • time – an absolute epoch style time (this data type is global, compare to set which is local). This data type references a specific time (clock on the wall time). Sometimes a brogrammer might want to record the exact time a connection took place.
  • void - the absence of a type. This type is usually seen associate to function (we'll get to those later).
  • any - this type is used to bypass Bro's strong typing. This type is also associated mostly with functions.
   
Just like many other programming language, brolang has the concept of scope. A variable can be declared to have one of three different scopes.
  • const - the variable cannot be changed (this can be trumped with decorations)
  • global - the variable is available to all loaded scripts
  • local - the variable is local to the module, function or event
Declaring and defining a variable can be done by deciding a scope, name, type, and value for the variable. Let's say I want to print my name to the screen. I'll need a variable of type string and I'd like to call it 'myname'. The variable can be defined and declared the following way
    const global myname: string = "anthony";
   
Because myname probably won't change, I've set it to be a constant.
If I wanted to have a variable that held my age, I could define it in the following way
   
    global myage: count = 42;
   
Because myage will never be negative, I've assigned it a type of count and not int. Run this script and try to understand what happens with the myage variable and its scope. The bro_done event won't fire until you terminate the Bro process (ctrl+c).

Broverview

Bro is an asynchronous event driven platform. Take some time right now and be sure to understand what that means because it is key to how Bro operates and should be well understood to be a brogrammer.

Bro has an event engine that raises events when it sees something happen on the network (those things could be DNS requests, UDP packets, a specific email subject, or a file MD5). It's up to the brogrammer to define events that should be raised and what should happen (aka Bro policies) when those events are raised. Bro programmers sit at the policy script interpreter layer, waiting for events to be raised by the core event engine.

Bro reads raw data off the wire or from a pcap file (by default using libpcap), similar to tcpdump or Snort. The packets Bro reads are reassembled into streams. Bro has many events that are raised from the network streams it reassembles. A simple example is the new_connection event. The Bro core (where the event engine is located) raises this event and it 'occurs' within scriptland whenever Bro sees a new connection happen. Bro also raises events about itself. For example, Bro raises an event right as it begins running and right before it terminates. Bro has many more events which will be covered in a later post.

In addition to Bro's event engine, Bro has a scripting layer that allows brogrammers to define what happens when an event is triggered. For now, imagine Bro's event engine only has the three example events I mentioned above (new_connection when a new network connection occurs, bro_init when Bro first begins running, and bro_done just before Bro terminates).
With only these three events a brogrammer could instruct Bro to
  • print a nice welcoming message when Bro starts, perhaps "Sup bro, sup?"
  • print a different message, perhaps "Yo bro, a new connection happened", whenever a new connection happens on the wire
  • print a closing message when Bro is told to terminate, perhaps "I'm going down, bro"
This script does just that. Download and run the script with the following command. This tells Bro to begin monitoring traffic on the eth0 interface and to run with the policy script provided.

/usr/local/bro/bin/bro -i eth0 broverview.bro

Terminate the Bro process (with ctrl+c) and inspect the output that is printed to STDOUT. You should see the messages defined in bro_init and bro_done, but Bro might not have printed the message from the new_connection event. If it didn't, that's because Bro didn't see any new connections while it ran. Try running the script and generating a new connection across the interface.

Read through the broverview.bro script. Syntactically, Brolang, or the Bro scripting language, looks similar to many other C based languages. Brolang inherits semi-colon delimited lines, curly brace code blocks, and other semantics fromthe C++ core it is built on top of.

You might have noticed that the event handlers in the broverview.bro script aren't executed in the order they appear in the script. This is because they don't execute unless Bro see the event on the wire and 'handles' them. This non-sequential execution of events is where the event-driven programming reference I made at the beginning of this post comes into play. Don't worry if you don't understand all the particulars of the script just yet.

Our example is very simple and somewhat useless, but it shows how a brogrammer can specify actions (in Bro vernacular, 'policies') for Bro to take
when an event occurs. Bro is capable of doing much more than just printing to STDOUT. In useful Bro scripts, nothing is printed to STDOUT. Instead, things are logged to files or alerts, like emails, are generated.
Bro can send you an email, raise an alert, execute a system call, or just about anything else you can script. This is where Bro's power shines through - in extending and customizing event handling.

You might have noticed that Bro, by default, generates logs in the directory
you execute Bro in. This is because, by default, Bro is configure with policy scripts that tell it to log information. For example, when an HTTP request and response are seen and completed (events are raised for HTTP seen on the wire), Bro writes some information about the HTTP connection to disk. Try running Bro on an interface while browsing to this blog and see what log files get written to disk. Open the log files and try to determine what the fields in the file are.

The events Bro raises are neutral. The Bro event engine's job is to let scriptland know when something happened. It is up to the brogrammer to determine if the event is good, bad, actionable or uninteresting. Thus Bro isn't really an IDS, but a network investigation platform. You could certainly build an IDS from Bro's scriptland but you could also build Conway's game of life. The core of Bro simply passes summaries of the network up to scriptland for a brogrammer to script against. If I don't care about SMTP traffic, I can tell Bro (in scriptland) not to do anything when the Bro core reassembles SMTP traffic. If I do care about SSL traffic, I can write a script to extract all SSL certificates Bro sees. In fact, that's what the ICSI Certificate Notary does.

Hopefully now you understand how Bro inspects live/stored traffic, generates and passes events to scriptland based on the traffic, which are handled by user defined scripts or policies.

If you still have questions, feel free to ask in the comment section. A complete summary of the Bro project can be found here. I recommend having the Bro cheat sheet on hand through out the rest of these posts.

Building and Running Bro

In this article I will introduce how to get Bro up and running on Debian based system as quickly as possible with the most features as possible. The rest of the articles will assume you followed this guide. You should know that the Bro website has a very useful quickstart guide of its own, but I've decided to write down the procedure I often use. If you have questions or want to learn more about the different ways to install Bro, you should visit their guide.

The first thing to do is to set a system up. You can use a VM or build a physical machine, it doesn't matter. We'll be grabbing the source code and compiling it to make sure we get all the latest features and frameworks.
Bro depends on some packages for compilation to succeed. You'll need to install these before you can build Bro. Run the following command to do so:

sudo apt-get install git cmake make gcc g++ flex bison libpcap-dev libssl-dev python-dev swig zlib1g-dev libmagic-dev

Now, download the latest Bro source code via git by running

git clone --recursive git://git.bro-ids.org/bro

Make sure you include the recursive option or you won't grab everything you need to compile Bro. When the clone is finished you should see a 'bro/' directory sitting in your working directory. Change into that directory. The default directory Bro is installed to is /usr/local/bro (that's what I'll be using for this and other blog posts) but that can be changed with the '--prefix=/desired/path' option. Configure the installation by running:

./configure

Compile and install the Bro software by running:

make
make install

NOTE: you might need root privileges install Bro

You can add bro to your path as a convenience by altering your PATH environment variable and exporting it, but I usually don't bother. Once Bro is installed, run it by calling it with the i option and specifying an interface name to begin sniffing on.

/usr/local/bro/bin/bro -i eth0

If Bro starts successful you should get some output similar to "listening on eth0, capture length 8192 bytes". Terminate Bro the same way you would with any other long running terminal process (ctrl + c). Depending on the packets Bro saw on that interface you might have some new files in your working directory with a .log extension.

Congrats, you just installed and ran Bro!

Bropening Statements

Welcome! If you are looking to learn more about Bro as a programming language, you have come to the right place. Bro is a network monitoring platform that can provide insight into what is happening on a network or what has happened within a trace file. Bro has the capabilities to collect traffic to a trace file, identify specific patterns within traffic, and summarize flows. Bro can provide the same capabilities provided by tcpdump, Snort, and netflow combined. And more!

Bro is also programmable, meanin an operator is bale to script between the previous mentioned capabilities. It comes with a large set of default and tunable scripts. These included scripts are used to build Bro modules and frameworks which extend what Bro can do. As Bro is extremely extensible, mastering its domain specific language unlocks the potential of Bro.

This site is dedicated to programming with the Bro language. The Bro documentation can be intimidating for beginners and the constant developement that makes Bro so useful means keeping up can be a challenge. For these reasons, I intended to post a series of 'getting started; articles for understanding the fundamentals of programming with Bro.

I hope you find this blog useful. Feel free to leave comments if you have any questions or suggestions.