Python Regular Expressions

Before today, the only real use I’d had for regular expressions in Python was to just find the first instance of a pattern. For example, if I want to find the contents of the text between the first set of single quotation marks (in this case ‘26245730’), I would proceed like so:

import re

all_id="'26245730': 817, '389595538': 735, '541129065': 529, '541129071': 340, '558870185': 305, '444325280': 287, '573974252': 272, '281314044': 222"

first_id = re.search("'(.*?)'",all_id)
print first_id.group(1)

The arguments passed to re.search define the pattern I am looking for: The single quotation marks on either side of the brackets show that I am looking for a pattern between them. The “.” within the brackets tells Python that I am happy with finding any character, number, etc and the “*” next to these mean it will look for 0 or more instances of this text. Finally, the “?” ensures that the expression isn’t greedy. What does it mean to be greedy with a regular expression? It means that instead of finding the pattern between the first two single quotation marks, it will find the pattern between the first and the last quotation marks! So I’ll end up with practically all of my string being returned!

Now, what if I wanted to find every single number between single quotation marks and output it as a list?

import re

all_id="'26245730': 817, '389595538': 735, '541129065': 529, '541129071': 340, '558870185': 305, '444325280': 287, '573974252': 272, '281314044': 222"

unique_id = re.findall("'(.*?)'",all_id)
print unique_id

You’ll notice how similar it is to the code example I showed before, with the exception of “findall” being used, rather than “search”.

The following two tabs change content below.
Computational biology PhD candidate at the Australian National University. I love writing (both articles and software), learning more about the world around us, and beekeeping. I also write for BioSky.co

Latest posts by Jack Simpson (see all)

Comments are closed.