[pybsddb] Closing a large read only btree takes a long time - possible to use DB_NOSYNC?

Chris Mulligan chris at polimetrix.com
Wed Mar 26 21:32:23 CET 2008


Hello folks,

First, I apologies for what may be a stupid question.

Second some system details: Ubuntu 7.04 AMD64, python 2.5.1. It appears
that I'm using version 4.4.52 (import bsddb, bsddb.__version__). The
hardware is plenty beefy on this machine, 4 Opteron cores @ 2.6GHz, 8GB
of RAM, and a large, fast SCSI array.

I'm working with some fairly large btree files (4-5GB on disk, ~1-2
million keys) and I've occasionally encountered a slightly odd
performance problem. I open them up as read only with a 256MB cache
using bsddb.btopen() and they're reasonably fast to work with. When I
close the files with foo.close() they a very long time. Keep in mind
that these are being used in read only mode and nothing has changed.

Closing a file with ~1 million keys, and ~350MB on disk takes about 5
minutes. 

Closing a file with about the same number of keys, but 3GB on disk,
takes 20 minutes.

After spending a long time reading various scripts and files about bsddb
and Berkeley DB in general I'm beginning to wonder whether calling
DB->close with the DB_NOSYNC option would help. It appears to be
unsupported in python, but I admit that the bsddb code has confused me
slightly. 

Does anyone have any thoughts on how this might be improved, or whether
I'm just off my rocker? I'd be happy to provide more exact numbers, run
some profiling, etc if it would help.

Thanks a lot,
Chris

--
Chris Mulligan
 




More information about the pybsddb mailing list