Friday, August 9, 2013

Extending Bro's Core

If you've made it this far. Congrats, you now know more about Bro than almost everyone, ever. Bro is a great framework and has been going through tremendous growth in the past few years. New modules, new functionality, new features are being built into Bro every time I hear about it. If you use Bro enough you may eventually find that Bro doesn't have everything you need. Luckily, Bro is open source so you can add functionality to it yourself.

The Levenshtein distance algorithm is often used by spell checkers to determine if a word is a misspelling of another word in a dictionary. The algorithm takes two strings as input and calculates how different the strings are by counting the number of insertions, deletions and additions it takes to transform one string to the other string. Let's see how one would go about implementing this as a built in function for use in scriptland in Bro's C++ core.

Hopefully you still have the directory you created from cloning the Bro git repo from Building and Running Bro. If not reclone it with the following command.
    git clone --recursive git://

Open the file that defines built in functions that handle strings and have a look around.
    vi bro/src/strings.bif
This file contains specially crafted C++ that gets parsed and compiled into Bro when you run those ./configure, make, make install commands. Every string related bif available in scriptland is defined in this file. I essentially stumbled my way through this file as I don't truly know C++ or the extensions the Bro project has built into the core's C++ code.

The function here can be added to the src/strings.bif file and compiled into Bro. The way I came up with this new functionality was to
  1. Find a basic function and see how it works, I saw is_ascii as a good function to copy from
  2. Rip out what the function does and add your own logic. The core C++ requires you to use Bro defined functions such as Len() and Bytes()
  3. Be sure to wrap whatever you are returning to scriptland in a Val() or StringVal()
  4. Compile your added function into Bro while praying you did everything correctly
 Once Bro finishes recompiling, you should be able to call levenshtein_distance(s1: string, s2: string) from scriptland and receive a correct distance metric.

No comments:

Post a Comment