Image you have a lot of nucleotide sequence identifiers and want to find out what organism the DNA is from. You could go to the NCBI website and spend a long time finding out, or you could write a short Python script using BioPython to find out the headers from each fasta file the identifier refers to:
Content
Before today, the only real use I’d had for regular expressions in Python was to just find the first instance of a pattern. For example, if I want to find the contents of the text between the first set of single quotation marks (in this case ‘26245730’), I would proceed like so:
import re all_id="'26245730': 817, '389595538': 735, '541129065': 529, '541129071': 340, '558870185': 305, '444325280': 287, '573974252': 272, '281314044': 222" first_id = re.search("'(.*?)'",all_id) print first_id.group(1)
The arguments passed to re.search define the pattern I am looking for: The single quotation marks on either side of the brackets show that I am looking for a pattern between them. The “.” within the brackets tells Python that I am happy with finding any character, number, etc and the “*” next to these mean it will look for 0 or more instances of this text. Finally, the “?” ensures that the expression isn’t greedy. What does it mean to be greedy with a regular expression? It means that instead of finding the pattern between the first two single quotation marks, it will find the pattern between the first and the last quotation marks! So I’ll end up with practically all of my string being returned!
C++ is the most recent language I’ve acquired, and while I certainly still have a lot to learn, I’ve gained a reasonable understanding from the following free resources online:
- LearnCpp.com – Learn C++
- tutorialspoint – The C++ Programming Language
- cplusplus.com – C++ Language
- XoaX.net – C++ Video Tutorials
- Google Developers – C++ Class
- Cprogramming.com – Programming Tutorials – C, C++, OpenGL, STL
- University of Southern Queensland – Object oriented programming in C++
- Stephan T Lavavej – Core C++
- Professor Peter Sommerlad – Modern and Lucid C++ for Professional Programmers
- Software Design Using C++
**UPDATE**
I was recently alerted to the fact that some of these resources can contain incorrect information/bad practices. Here’s a great list of C++ resources which have been vetted by the reddit learnprogramming community. One recommended resource is a book titled “C++ Primer 5th edition”. It was published in 2012 so it covers the C++11 standard.
Here’s a list of some of the websites, books and videos (for both Python and C++), that I have found very useful while learning OpenCV.
Python
- Using OpenCV with Python and ROS (Video) – Great introduction to computer vision and OpenCV.
- PyImageSearch – Website has a lot of great tutorials on many different applications of OpenCV.
- OpenCV-Python Tutorials (Website) – Official Python documentation and tutorials.
- OpenCV-Python Tutorials (Website) – Comprehensive set of tutorials.
- OpenCV Python (Website) – The original Python tutorials I started with, most of these have been ported to the website above.
- OpenCV 3 With Python (Website) – Great list of Python tutorials.
- Tentative NumPy Tutorial (Website) – As OpenCV reads images into NumPy arrays, its useful to have an understanding of this library.
- Python – Getting started with OpenCV (Website) – Introduction to using Python with OpenCV.
- Programming Computer Vision with Python (Book) – How to use Python to build computer vision programs.
- OpenCV Computer Vision with Python (Book) – Short book on getting started with OpenCV and Python.
- Building an image processing pipeline with Python (Video) – Talk about how Python and OpenCV are used in industry.
When I first started learning OpenCV, I was working exclusively with Python. While I am still a huge fan of the language, today all of my OpenCV programs are written in C++. Why?
- Some of the deeper functionality of OpenCV has not been completely ported to Python (although hopefully the release of OpenCV 3.0 will fix most of these issues).
- Most in-depth textbooks on image processing and computer vision that cover OpenCV use C++ as their primary language. It was therefore easier to learn from these resources by adopting the language.
- A lot of the computer vision techniques I use (SIFT, machine learning, etc), are better documented in C++.
- Passing images back and forth between NumPy arrays has overhead that C++ doesn’t have to worry about.
- As has been suggested to me by Carl Bell, Python struggles to perform well with overloaded functions.
Lately, there’s been quite a few articles (such as here and here) discussing the use of the university lecture and its perceived shortcomings. The question is asked: is the lecture an unengaging relic from before the digital age? Are lectures really the best way to learn? I attended many lectures throughout my undergraduate studies, and have also tried out massive open online courses (MOOCs) from providers such as Coursera. To be perfectly honest, I don’t mind lectures at all. While they could be dull at times, they had the advantage of sitting you in a room where information was being fed to you whilst you were (mostly) shielded from the distractions of the outside world. Having a timetable of lectures was a powerful way to organise your learning (especially as you were paying for this education). It takes a lot of motivation to sit through a fraction of the lectures each week from online courses.
Python’s random module makes it extremely easy to generate random DNA bases.
import random dna = ["A","G","C","T"] #output a random base print(random.choice(dna))
Now to generate a specific number of random bases, all we have to do is use Python’s range function:
If you want to compile and run a C++ program using OpenCV in Sublime Text, then copy and paste the code below into a build system file. If you’re not interested in explicitly using C++11 you can delete the “-std=c++0x” section.
{ "shell_cmd": "g++ -std=c++0x ${file} -o ${file_base_name} `pkg-config --cflags --libs opencv` && ./${file_base_name}" }
Here are several talks from PyCon 2014 I thought looked rather interesting from a research perspective:
Here’s a list of a few R tutorials (in addition to the one I wrote), which I’ve found (or look) rather useful:
Google Developers R Tutorials
A slightly Different Introduction to R