Author Archives: afmcdonald

About afmcdonald

UPS student studying abroad in Budapest, Hungary. I am doing a computer science program with an emphasis on Neuroscience and biological networks.

Formal Languages

As always, life in Budapest is very busy, but I’ve managed to produce another blog post. This was my first Thanksgiving away from my family, but all the students at AIT Budapest (all 40 of us) got together with the professors for a massive feast with traditional Hungarian and American dishes. Many of the Hungarian students had never celebrated Thanksgiving before, so it was a lot of fun to experience it with them.

In my Algorithms for Bioinformatics class, we were introduced to the abstract concept of formal grammars in computer science and their interesting applications in real biological systems. A grammar is essentially a set of production rules for generating a string in a language. The rules form strings from the language’s alphabet that are syntactically valid. However, the output strings do not have inherent meaning they are just valid for the language. A formal grammar is a set of rules accompanied by a start symbol for initializing the process. The application of Formal Language Theory is used in theoretical computer science, theoretical linguistics, formal semantics and mathematical logic. Graduate students at MIT created a program to generate random research papers that are syntactically correct, but when read over, have no real world meaning. Below is a paper that was generated for me, take a look!

MIT Paper Generator

A couple of these papers have actually been accepted at conferences (very low ranking conferences, but conferences none the less!). In biological systems, a Lindenmayer systems a type of formal grammar that consists of an alphabet of symbols, a collection of production rules, an initial axiom string and a mechanism for translating the generated strings into geometric structures. From these generated strings, scientists can actually generate accurate predictions of plant structure, like the one created below.

Language-Generated Trees
Besides modeling plant growth, context free grammars can model protein folding as well. In this, the language is the string of amino acids, and the production rules will create folds resulting in alpha helices and beta sheets, which accurately resemble real world protein structure.

Jo Napot Kivanok

Budapest is finally starting to admit that summer is indeed over, and the city is transitioning to crisp autumn weather. The outdoor Turkish baths are shutting down, forcing bathers indoors to the ornate swimming halls. Every day I try to pick up a little more Hungarian, my vocabulary and conversations are currently limited to ordering food and describing myself (Amerikai diak vagyok). Even though the iron curtain fell many years ago, it is fascinating to see everyday throwbacks to how life was back in that time (oppressive grey apartment buildings, people pushing wheelbarrows of hundreds of potatoes down a busy street). The city of Budapest is actually incredibly developed, with a better public transit system than I’ve seen anywhere in the states. While here, I found a lacrosse team to play with, and we traveled this past weekend to Serbia to compete in a multinational tournament that we ended up winning! The team is filled with a bunch of goofballs:

 

bolasz

 

Everyone is super friendly, and are willing to let me practice my weak Hungarian on them. Most people here, not just on the team, actually speak very good English.

 

In my classes, we are talking about large datasets gathered from biology, such as genome and protein sequencing, and the issues that arise from data management and analysis.

The cost of sequencing an entire human genome has fallen drastically (under $5000, and projected to approach $1000), as well as the time needed to perform the sequence. But with this great technology comes the burden of overwhelming amounts of data. Scientists are now not only working on improving biological reading techniques, but the ways in manage the data as well. The most pressing issues are: data transfer, standardization of the data formats, access control and data integration.

One such platform to solve the problems presented above is a concept known as cluster computing. The goal behind this is to realize supercomputer performance without the need of actually possessing a supercomputer. Many computers on a single local network are linked together so that they can function as one single computer. This method is extremely cost effective and enables supercomputer performance for a fraction of the price. However, the other costs associated with this method (specialized facility and hardware, as well as extremely knowledgeable IT support) present potential drawbacks.

To overcome some of these issues, many companies are switching to cloud computing for their data storage and analysis. In the cloud, an on-demand shared pool of computing resources is available whenever needed for a very low cost. This is especially effective when the task doesn’t require the data to be continuously accessed, but instead read for one-off tasks. Cloud computing comes with its own set of drawbacks, such as privacy concerns about health records in public space and network bandwidth restrictions associated with  uploading the large datasets into the cloud.

Similar to both cloud and cluster computing is the method of grid computing. In grid computing, tasks are distributed to ‘loosely’ connected computers (as opposed to a single network of computers in cluster computing).These computers could be separated anywhere in the world, in different companies, or even running on volunteers laptops at home. This enables companies to muster huge computational power at almost no cost to them. Like cloud computing, grid computing suffers when transferring or uploading data. Additionally, there is minimal control over the hardware that the programs are actually running on. One way of speeding grid computing up comes from the practice of heterogeneous computing. These computers utilize accelerators, such as GPUs, to turn one computer into a cluster computer.

First Post from Budapest

It’s taken me a while to finally get adjusted to the rhythm of life and school here in Budapest. This first post is long overdue, but this is the first real free moment I’ve had! Most of my posts here will be focused on biological and neuro concepts, but approaching them from a computer science framework. The relevant classes I’m taking in Budapest are as follows: Structure and Dynamics of Complex Networks, Advanced Algorithms for Bioinformatics and Computational Biology and Medicine. I will try and use this blog to post some of the most interesting and relevant information that I learn from these classes.

In the beginning of my Comp Bio class, everyone chooses a gene that we will be working with for the rest of the semester. You can find a brief write-up of this gene below. In the coming weeks, we will be learning some genome manipulation techniques and gene visualization programming.

After a quick search on genecard.org for ‘receptor’, then specifically looking for neuroreceptors, I came across the DRD2 gene. The DRD2 gene encodes a subtype of the dopamine receptor, D2. This receptor is G protein-coupled, meaning it is a transmembrane receptor that senses the presence of molecules outside the cell and will activate a ‘cascade’ of events leading to internal cellular responses. The D2 receptor retards adenylyl cyclase activity, an enzyme that is activated when multiple different signals occur in parallel. Adenylyl cyclase is a catalyst in the conversion of ATP to cyclic AMP and pyrophosphate.

Splicing of this gene has resulted in two transcriptions, creating different isoforms of D2. In addition, regulation of D2R expression in mice has shown to control exploration, memory creation and synaptic plasticity. Older antipsychotic drugs are antagonists for the D2 receptor, meaning they would not trigger a chemical reaction upon binding, but instead would block agonists from binding to the receptor, dampening the agonists intended effect.

DRD2 has been associated with a multitude of diseases, what follows is just a short list of the highlights: PTSD, OCD, novelty seeking personality, manic-depressive illness, ADHD, Schizophrenia, Huntington’s and Antisocial personality disorder.
D2 interacts with three other proteins: EPB41L1 (mediates interactions between cytoskeleton and plasma membrane, binds to and stabilizes the D2 dopamine receptor at the neuronal plasma membrane), PPP1R9B (receive excitatory input) and NCS-1 (a neuronal calcium sensor that regulates synaptic transmission, nerve growth, memory, corticohippocampal plasticity). The attribute I find most fascinating is that increased levels of NCS-1 have resulted in increased spontaneous exploration in mice, perhaps indicating the role of NCS-1 with curiosity.