[pybsddb] len() sometimes slow for bsddb3 backed shelves

Gregg Lind gregg at renesys.com
Fri Sep 19 17:30:17 CEST 2008


Hello list!

I hope you can help me understand a problem I'm having.

Sometimes getting the number of keys in a bsddb database is very slow.

for example:

def fastShelfOpen(filename, flag='c', protocol=None, writeback=False, 
cachesize=100, *args, **kwargs):
    if cachesize:
        cachesize = int(MbToBytes(cachesize))
    else: raise ValueError, "cachesize must be defined"
    fh = bsddb3.hashopen(filename, flag=flag,cachesize=cachesize)  # 
handle more optional arguments
    fs = shelve.Shelf(fh,protocol=protocol,writeback=writeback,*args, 
**kwargs)
    return fs


These shelves are reasonably small ~ 115 Mb, with 30,000 keys.  I open 
them with 25 Mb of cache. 

Sometimes len(S) will take upwards of 3 minutes (and sometimes, it is 
instantaneous, expecially if it  len() has been called recently)!   Is 
the problem in the shelve layer?  File I/O?  

Any insight would be appreciated.  Working with bsddb files seems to be 
something of a black art still. 

Thanks,

Gregg Lind
Data Engineer
Renesys Corp.



More information about the pybsddb mailing list