[pybsddb] Problem scanning large hashed database

Fri Dec 5 01:20:49 CET 2008

On Fri, 2008-12-05 at 01:09 +0100, Jesus Cea wrote:
> andrew wrote:
> > On Thu, 2008-12-04 at 21:47 +0100, Jesus Cea wrote:
> >> Anyway, fully scanning the database seems a bad thing to do. You don't
> >> need BDB for that. Are you sure there is no other way?.
> > 
> > I guess the problem is that 99% of the time we're just reading and
> > writing single objects via a hash index (I presume this is what you get
> > if you're not using btrees). Another possibility is to create a separate
> > sortable index for the update time of each object, since that's the
> > second most important access method, i.e., give me the last 50K objects
> > updated. However, I have no idea what the maintenance overhead of that
> > would be and how much it would slow down the 99% of hashed reads /
> > writes.
> 
> I would recommend you to store the indexes you need. If you update them
> in the same transaction, and your DB cache is big enough, performance
> hit should be minimal. Anything is better than scanning the entire
> database by hand.
> 
> If you want to query by range, like in your example, keep that index in
> a btree. That btree can be in another database file, or in the same
> file, with a separate logical database created as a btree.

Thanks for your advice, Jesus. I'll look into creating that separate
index.

Cheers, Andrew.