Home Content Find sequence descriptions with BioPython

Find sequence descriptions with BioPython

by Jack Simpson

Image you have a lot of nucleotide sequence identifiers and want to find out what organism the DNA is from. You could go to the NCBI website and spend a long time finding out, or you could write a short Python script using BioPython to find out the headers from each fasta file the identifier refers to:

import re
from Bio import Entrez
from Bio import SeqIO

#I had a bit of mess cluttering my identifiers, so I extracted them with regular expressions
all_id="'26245730': 817, '389595538': 735, '541129065': 529, '541129071': 340, '558870185': 305, '444325280': 287, '573974252': 272, '281314044': 222"
unique_id = re.findall("'(.*?)'",all_id)

email="my@email.com"
Entrez.email = email
for each_id in unique_id:
fetch_seq = Entrez.efetch(db="nucleotide", rettype="fasta",retmode="text", id=each_id)
seq_record = SeqIO.read(fetch_seq, "fasta")
fetch_seq.close()
print seq_record.description

This script will take a list of your sequence identifiers and print the descriptions from the fasta file.

Sign up to my newsletter

Sign up to receive the latest articles straight to your inbox

You may also like

Leave a Comment