Jo Napot Kivanok

Budapest is finally starting to admit that summer is indeed over, and the city is transitioning to crisp autumn weather. The outdoor Turkish baths are shutting down, forcing bathers indoors to the ornate swimming halls. Every day I try to pick up a little more Hungarian, my vocabulary and conversations are currently limited to ordering food and describing myself (Amerikai diak vagyok). Even though the iron curtain fell many years ago, it is fascinating to see everyday throwbacks to how life was back in that time (oppressive grey apartment buildings, people pushing wheelbarrows of hundreds of potatoes down a busy street). The city of Budapest is actually incredibly developed, with a better public transit system than I’ve seen anywhere in the states. While here, I found a lacrosse team to play with, and we traveled this past weekend to Serbia to compete in a multinational tournament that we ended up winning! The team is filled with a bunch of goofballs:

 

bolasz

 

Everyone is super friendly, and are willing to let me practice my weak Hungarian on them. Most people here, not just on the team, actually speak very good English.

 

In my classes, we are talking about large datasets gathered from biology, such as genome and protein sequencing, and the issues that arise from data management and analysis.

The cost of sequencing an entire human genome has fallen drastically (under $5000, and projected to approach $1000), as well as the time needed to perform the sequence. But with this great technology comes the burden of overwhelming amounts of data. Scientists are now not only working on improving biological reading techniques, but the ways in manage the data as well. The most pressing issues are: data transfer, standardization of the data formats, access control and data integration.

One such platform to solve the problems presented above is a concept known as cluster computing. The goal behind this is to realize supercomputer performance without the need of actually possessing a supercomputer. Many computers on a single local network are linked together so that they can function as one single computer. This method is extremely cost effective and enables supercomputer performance for a fraction of the price. However, the other costs associated with this method (specialized facility and hardware, as well as extremely knowledgeable IT support) present potential drawbacks.

To overcome some of these issues, many companies are switching to cloud computing for their data storage and analysis. In the cloud, an on-demand shared pool of computing resources is available whenever needed for a very low cost. This is especially effective when the task doesn’t require the data to be continuously accessed, but instead read for one-off tasks. Cloud computing comes with its own set of drawbacks, such as privacy concerns about health records in public space and network bandwidth restrictions associated with  uploading the large datasets into the cloud.

Similar to both cloud and cluster computing is the method of grid computing. In grid computing, tasks are distributed to ‘loosely’ connected computers (as opposed to a single network of computers in cluster computing).These computers could be separated anywhere in the world, in different companies, or even running on volunteers laptops at home. This enables companies to muster huge computational power at almost no cost to them. Like cloud computing, grid computing suffers when transferring or uploading data. Additionally, there is minimal control over the hardware that the programs are actually running on. One way of speeding grid computing up comes from the practice of heterogeneous computing. These computers utilize accelerators, such as GPUs, to turn one computer into a cluster computer.

Leave a Reply

Your email address will not be published. Required fields are marked *