# Which Blast database is needed with hole filler?

This post is a wiki. Anyone with karma >75 is welcome to improve it.

Hello,

I was trying to try out the hole-filler capabilities of pathologic (v19). I've done through the documentation and articles. Blast+ is installed and on my path and I have .ncbirc set up. But one basic question eludes me: what database(s) should I install? The genome and/or transcriptome and/or proteome of the organism I'm analyzing? Or the files from the most closely related organism from the curated collection?

In my first runs, when I've tried to configure the blast databases from the pathway-tools menu, I see messages on my terminal window:

Warning: Protein
#<G56-10049-MONOMER instance frame in...
The sequence length of this protein could not be computed.
This protein will be ignored.


I take this to mean that it was expecting a protein database but not finding one. (there is a db for this in my blast data directory.)

Thanks Joe Carlson

edit retag close merge delete

Sort by » oldest newest most voted

Joe, under the Tools menu, there's an option "Prepare BLAST Reference Data" (See UserGuide section 3.11.3.13). It uses the PGDB and the provided sequence data to create the BlastDB for you. The resulting blastDB files are in the xxxcyc/VERSION/data directory. The Hole Filler then uses this BlastDB as the reference. You can look at the protseq.fsa and dnaseq.fsa for your genome. It's possible, the source sequence you used with Pathologic has problems. You can inspect these files for errors.

more