The Levenshtein distance algorithm is often used by spell checkers to determine if a word is a misspelling of another word in a dictionary. The algorithm takes two strings as input and calculates how different the strings are by counting the number of insertions, deletions and additions it takes to transform one string to the other string. Let's see how one would go about implementing this as a built in function for use in scriptland in Bro's C++ core.
Hopefully you still have the directory you created from cloning the Bro git repo from Building and Running Bro. If not reclone it with the following command.
git clone --recursive git://git.bro-ids.org/bro
Open the file that defines built in functions that handle strings and have a look around.
vi bro/src/strings.bifThis file contains specially crafted C++ that gets parsed and compiled into Bro when you run those ./configure, make, make install commands. Every string related bif available in scriptland is defined in this file. I essentially stumbled my way through this file as I don't truly know C++ or the extensions the Bro project has built into the core's C++ code.
The function here can be added to the src/strings.bif file and compiled into Bro. The way I came up with this new functionality was to
- Find a basic function and see how it works, I saw is_ascii as a good function to copy from
- Rip out what the function does and add your own logic. The core C++ requires you to use Bro defined functions such as Len() and Bytes()
- Be sure to wrap whatever you are returning to scriptland in a Val() or StringVal()
- Compile your added function into Bro while praying you did everything correctly