Skip to content. | Skip to navigation

Personal tools
You are here: Home Programming Ruby The Genbank parsing problem

The Genbank parsing problem

— filed under: ,

Oh BioRuby, you are just as capricious and inconsistent as Ruby itself.

You can parse Genbank bank files with BioRuby like thus:

puts "Parsing seqs ..."
Bio::FlatFile.auto("foo.genbank").each_entry { |gb|
   puts "Sequence '#{gb.to_biosequence.entry_id}'"
}
puts "Finished."

which will print the id of every sequence in the file. However, if the file ends with blank lines, i.e. after the genbank terminator:

2161 agagcccgaa ttgatgcacg gattgatttc gaatctggaa ggataaagaa agaggaattc
2221 gctgagatca tgaagacctg ttccaccatt gaagacctca gacggcaaaa atag
//
<blank line here>

such as what the Fetch functionality in BioRuby produces, then BioRuby reads these as additional, empty records:

Parsing seqs ...
Sequence 'CY011043'
Sequence ''
Finished.

Nice. You can route around this by trimming the blank lines before handing it to the parser:

puts "Parsing seqs ..."
data = File.open("foo.genbank", "rb") { |f| f.read() }
buffer = StringIO.new(data.rstrip!(), 'rb')
Bio::FlatFile.auto(buffer).each_entry { |gb|
   puts "Sequence '#{gb.to_biosequence.entry_id}'"
}

to give:

Parsing seqs ...
Sequence 'CY011043'
Finished.
Document Actions
Visitors
Locations of visitors to this page
Ads
 
Sections