Sunday, March 30, 2014

Why a New Language?

I recently had a conversation with someone, I will describe as a security connoisseur, about Bro. He asked me what the benefits of Bro were given the ubiquity and age (insinuating some correlation to age and reliability) of Snort. This comparison often comes up in discussions when I mention Bro. I promptly pointed out that Bro is just as old as Snort and that Bro's DSL provides flexibility for stream analysis which Snort cannot provide. Stating that new programming languages are only adopted when they solve a new problem, I was asked, what purpose does the Bro language solve? My immediate answer was native network data types. This hastily provided answer was incomplete and didn't convince him. The connoisseur responded that native network data types are only a matter of mathematical conversions and aren't anything special.

I've been rolling the question around in my head for a bit now. What new problem does the Bro language solve and why should it be used over an existing language? First off, it's a domain specific language and is not intended to be used as a general purpose language, like Python. Secondly, I wouldn't say the language solves a new problem. The language (and the platform it is tied to) solves an old problem, that of determining what is occurring on a network, in a new way.
The Bro language provides these solutions:
  • Both packet analysis and connection analysis is available
  • The ability to analyze high performance links with out needing to know C/C+
  • Native network data types (I still consider this a benefit)
    • if ( in
    • local http: port = 80/tcp
  • a single interface to the otherwise complex tasks of reassembling streams, parsing protocol fields, defining policies and acting on those policies
    • Opposed to gluing different tools (tcpdump, Snort, Razorback, pynids, etc) together
Hopefully this post provides a more verbose answer than simply "native network data types".

Sunday, March 9, 2014

Learning the Hard Way

I learn programming in two ways. I learn what works and try to repeat it. I learn what breaks and try to avoid it. If you want to learn how to do something, learn all ways to not do that something.
This may be obvious, but learning to write Bro means learning another programming language. The Bro scripting language is somewhere in between C++ and Python and, like any language, comes with its own pitfalls, quarks and learning curve.
Here is a list of gotcha's I've run across while learning to write Bro scripts.

The signature framework shouldn't be used the way you might think.
When I first saw Bro had a signature framework I thought I'd dump Emerging Threats Snort rules into it somehow and everything would be great. That's not the case. While Bro's signature framework is powerful, Bro isn't Snort. Bro uses signatures, similar to Snort, to identify something specific. A possible cleartext shell, a possible SSN, or possible HTTP traffic are all examples. Bro doesn't just alert when a signature matches though. Bro can pass information about the connection which matched to script land. For example, to detect protocols on non-standard ports, Bro uses signatures to analyze and tag connections. Those tags indicate to other scripts how to handle the connection.

Don't change the provided scripts.
This one is pretty simple. The Bro team provides a collection of scripts with the project. Don't change them in place. Most values and variables in those scripts can be redefined in other places. Instead, write your own scripts and load them into Bro via %PREFIX%/share/bro/local.bro or specifying them when invoking Bro via CLI. Installing Bro will overwrite the built in scripts.

For loops are random access.
For loops in Bro scripts are more like iterators in Python oposed to C style for loops. Forget about counters. It is possible to hack up counters in for loops, but problems usually can be solved without them. For loops, like the majority of things in Bro, don't occur in order. Which leads into the next quark.

Everything in scriptland is asynchronous.
This is a blessing, as long as you understand the paradigm. Events in scriptland don't happen in the order they appear in the file (unlike a procedural BASH script, for example). The same event can occur in multiple different Bro scripts, which get loaded in different ways. All the event with the same name are handled at the same time (ignoring priority). This is really powerful if you think about it. On a saturated link Bro can't wait for some event handling body to finish; it has more packets to process.
Beware: the input framework is asynchronous too. Bro can't wait for the input framework to finish reading a file from disk, it has packets to process (even if those packets are in a tiny pcap). This can lead to the packet processing finishing before some input processing.

Most of the documentation is in the source.
Its diffcult to write solid C++ and Python, let alone solid documentation about solid C++ and Python. This is slowly changing and more documentation is slowly being released. I found the Bro Exchange and Bro Workshop videos extremely helpful. To learn Bro scriting concepts learn to navigate the source (and learn your grep flags).

Connection IDs are mostly forever unique.
Running Bro on a trace file generates nice log files with connections index by connection unique identifiers. Running Bro twice on one trace file will generate different connection UIDs. This makes working back from logs to pcaps a bit challenging.

Protocol analyzers are a challenge to write.
Perhaps it is because I'm not a seasoned C++ developer, but creating new protocol analyzers is difficult. The current analyzers are, however, quite robust. Just beware when your traffic is an edge case to the protocol spec. I've run into a couple of instances where a few HTTP streams used a combination of x0d x0a Bro could not identify. The connection didn't end up in the HTTP log file as I expected.
Raw packet contents are exposed to scriptland through events such as packet_contents. However, scriptland protocol analyzers are processing expensive.

Most of the challenges listed above can make Bro scripting difficult to navigate at times. However, they ultimately exist because Bro is built to scale.  Any state a script keeps in scriptland has potential to be harmful as network traffic is analyzed. Beware global variables and not expiring data.
Beware the order of events and processing.

If scripts are written by programmers ignorant to potential pitfalls, then unexpected failures can occur. Hopefully this list will
assist script developers to further understand Bro and write bug free scalable scripts.