Pyral (Python + Viral) was the name of a project I worked on in Dr Joanne Macdonald’s lab between September 2012 – January 2013 (although I am still providing tech support for the code and helping manage the server to this date). Throughout this time I wrote a lot of Perl and Python code to run on the university’s Linux server. The aim of these programs were as follows:
- Download all the viral ref-seq genomes from GenBank;
- BLAST a sequence of interest and retrieve all similar files;
- Concatenate all sequences into one file that was run through CD-HIT;
- Analyse the CD-HIT output, returning a file with the cluster numbers that sequences of interest may be found in;
- Find variable length conserved regions of DNA within a designated cluster;
- Ensure conserved region of DNA is completely dissimilar to that found in other virus clusters.