cjfields at uiuc.edu
Wed Feb 6 16:48:45 EST 2008
On Feb 6, 2008, at 2:57 PM, Susan J. Miller wrote:
> Barry Moore wrote:
>> I'm joining this discussion late so my apologies if I'm missing the
>> original point. If you're trying to routinely download thousands
>> of sequences from GenBank or SeqHound you probably want to be using
>> ftp to download the flat files and query/parse locally. If you're
>> trying to stay on top of the latest Drosophila ESTs, then how about
>> setting up a nightly cron job to download the incremental updates
>> from NCBIs ftp (ftp://ftp.ncbi.nih.gov/genbank/daily-nc) and parse
>> that for Drosophila EST sequences. The EST division is huge, but I
>> would think nightly incrementals should be manageable.
> Hi Barry,
> I'll try your suggestion. I guess my interpretation of the
> documentation for SeqHound was erroneous. (Who knows what 'large
> numbers of sequences' means?) I tried using SeqHound's
> get_Stream_by_id method to fetch 10000 sequences, 500 at a time, and
> got a timeout error.
Barry's and Brian's suggestions make more sense. You could also
possibly automate a Entrez query to limit retrievals to a period of
time instead of munging through the last releases; it all depends on
how many sequences you need to parse through.
The SeqHound timeout may be set up on their end to prevent a single
server from spamming them with tons of requests. NCBI is a bit more
tolerant but can be brittle with busy server traffic.
More information about the Bioperl-l