Earlier in the year I had to reset the observation hive and sadly that process resulted in the death of a fair few bees who ended up on the ground. Now, I knew that there were large carnivorous bull ants in the area and I completely expected them to clear away the bodies within a few hours. Unexpectedly though, none of the bodies were removed.
After a couple of days, I took a closer look and noticed something interesting – the bull ants were missing, but the tiny ants* in the area had found them! I’ve circled a few of them in the image below.
Image you have a lot of nucleotide sequence identifiers and want to find out what organism the DNA is from. You could go to the NCBI website and spend a long time finding out, or you could write a short Python script using BioPython to find out the headers from each fasta file the identifier refers to:
Before today, the only real use I’d had for regular expressions in Python was to just find the first instance of a pattern. For example, if I want to find the contents of the text between the first set of single quotation marks (in this case ‘26245730’), I would proceed like so:
import re all_id="'26245730': 817, '389595538': 735, '541129065': 529, '541129071': 340, '558870185': 305, '444325280': 287, '573974252': 272, '281314044': 222" first_id = re.search("'(.*?)'",all_id) print first_id.group(1)
The arguments passed to re.search define the pattern I am looking for: The single quotation marks on either side of the brackets show that I am looking for a pattern between them. The “.” within the brackets tells Python that I am happy with finding any character, number, etc and the “*” next to these mean it will look for 0 or more instances of this text. Finally, the “?” ensures that the expression isn’t greedy. What does it mean to be greedy with a regular expression? It means that instead of finding the pattern between the first two single quotation marks, it will find the pattern between the first and the last quotation marks! So I’ll end up with practically all of my string being returned!