I haven’t written in since I started my job a bit over a month ago and a lot has happened since then. For those who didn’t catch my last blog post, I’ll provide a quick recap: I am working at OHSU as a programming/bioinformatics intern working on expanding some data analysis software that helps with accurately sequencing protein samples from mass spectrometry data.
I’ve been working in this lab for the last three summers, which has given me a lot of background in the experimental and analytical sides of protein sequencing. After taking Intro to Computer Science at Puget Sound, I became interested in the technical and computer science side of protein analysis. Protein sequencing produces huge amounts of computerized data, so learning how to interpret and analyze this data is an imperative part of any proteomics lab.
The first thing I did at work was learn how to program in Python. I spent about a week reading books on Python and experimenting with thelanguage. I had learned Java in Intro to Computer Science, and while Python code “looks” a lot different from Java code, the languages are actually really simple. Python is known for being easy to read but very powerful, so it was very quick to learn. However, it has so many built-in extensions that I’m still consistently reading more about the language. UPS’s focus on writing with good programming style made learning and writing Python relatively easy. My intro class presented a surprising degree of depth for an introductory survey, and I found that there were very few new concepts that I had to learn – most of what I did was just look up new commands for things I had already learned to code, or learn slightly different methods for writing similar types of programs.
After learning Python I dove right in and began work on expanding the analysis software to add support for a common proteomics data file type. This took about a week with debugging. I then produced another program to add support for a different data file type, which is produced and formatted in an entirely different way. The hardest part of all this was learning the unique nuances of each data file type since there is precious little documentation for each file type. Learning how each file type was produced and formatted required its own sort of detective work. Once I figured out how the file types were produced and formatted, I worked on writing and eventually optimizing my code. My mentor, Dr. Phil Wilmarth, was a huge help throughout the process. He constantly showed me new ways to code things that I hadn’t thought of, and helped me write the best and clearest code possible. His extensive knowledge of protein bioinformatics helped me learn a ton – not only about programming but also about proteomics, informatics, and data analysis in general.
The next step will be to analyze the data that we produce using the new software. The full data analysis process can take upwards of a day to complete, so we’ll begin to analyze some data next week.
I’ve certainly learned a lot about programming and data analysis through this internship. While I had often heard that data was integral to computer science, I never realized exactly how much data and computer science were interrelated since I had not “seen” the connection first-hand. I’m also amazed at how much hard work programmers and scientists put into the data analysis side of chemistry. While this often goes unnoticed when doing bench work, it is a really interesting side of chemistry that I definitely hope to keep learning. UPS offers classes in computational chemistry and programming that do a great job of integrating science and technology, so that will certainly be a next step in learning more about informatics and scientific data analysis.