Monthly Archives: December 2013

Formal Languages

As always, life in Budapest is very busy, but I’ve managed to produce another blog post. This was my first Thanksgiving away from my family, but all the students at AIT Budapest (all 40 of us) got together with the professors for a massive feast with traditional Hungarian and American dishes. Many of the Hungarian students had never celebrated Thanksgiving before, so it was a lot of fun to experience it with them.

In my Algorithms for Bioinformatics class, we were introduced to the abstract concept of formal grammars in computer science and their interesting applications in real biological systems. A grammar is essentially a set of production rules for generating a string in a language. The rules form strings from the language’s alphabet that are syntactically valid. However, the output strings do not have inherent meaning they are just valid for the language. A formal grammar is a set of rules accompanied by a start symbol for initializing the process. The application of Formal Language Theory is used in theoretical computer science, theoretical linguistics, formal semantics and mathematical logic. Graduate students at MIT created a program to generate random research papers that are syntactically correct, but when read over, have no real world meaning. Below is a paper that was generated for me, take a look!

MIT Paper Generator

A couple of these papers have actually been accepted at conferences (very low ranking conferences, but conferences none the less!). In biological systems, a Lindenmayer systems a type of formal grammar that consists of an alphabet of symbols, a collection of production rules, an initial axiom string and a mechanism for translating the generated strings into geometric structures. From these generated strings, scientists can actually generate accurate predictions of plant structure, like the one created below.

Language-Generated Trees
Besides modeling plant growth, context free grammars can model protein folding as well. In this, the language is the string of amino acids, and the production rules will create folds resulting in alpha helices and beta sheets, which accurately resemble real world protein structure.