[pybsddb] File-like interface for get/put?

Jesus Cea jcea at argo.es
Wed Aug 13 19:55:36 CEST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Wolever wrote:
| Hey,
| I'm considering using bsddb as a storage backend for a storage
| service.  One of the requirements, however, is that the data being
| stored should never be in memory all at once because it might be
| "very big".  The obvious solution to this is to return a file-like
| object which can be read() in small hunks.
|
| So, back to bsddb.  I've noticed that get/put accept dlen/doff

Management of huge records, even using dlen/doff, are far from optimal:
they use "overflow" pages, slower to access, and any write will rewrite
the entire record, even if it modifies a single byte. This last point is
specially important if you are using replication. Oracle is aware of it,
maybe they solve this in the future. Beware also eating all of your BDB
cache with a single object.

Meanwhile, the usual approach would be to break the huge object by hand,
and store each fragment as a separate record in the database. The key
for the register could be the file offset, using "set_range()" to "seek"
on it when reading. And/or cursors to stream easily and fast (if you use
btree).

You need to be aware of usage patterns to do the right design. A
read/write approach will be very different to a write once, read many
environment. Or an append-only configuration. Or a single thread /
heavily multithreaded application. Berkeley DB Oracle documentation is
VERY GOOD; read it.

I would suggest you to study Durus and my Berkeley DB backend for it if
you really plan to store huge number of objects, or huge objects
(internally fragmented as several small objects). This provide an object
oriented view, handles object caching, etc.

For example, I use this technique for my LMTP/POP3 server: each mailbox
has several messages, and each message is a btree indexed by file
offset, when each fragment is "more or less" (rounded to the
line-ending) 4Kbytes in size.

Currently managing about 2.9 terabytes, and counting... :)

A client of mine uses this very same technique to manage medical
imaging. Currently 222 terabytes and growing. In a single (distributed)
database, using Durus + my Berkeley DB backend + some private code for
distributed transactions and replication + Berkeley DB.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
.                              _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSKMgDZlgi5GaxT1NAQJKTQP9GN7ttscuyQVeMvXVI6hg7ByNpvgTuKuK
vOFqj5tLELdOaDjCMU7sYErVoqlaNEiq/PSvkLRMwgBYjUxmtT4Y8I5Uc2YDoECQ
bg9Uv6EXBLRBL+2DVitucnKJJgbvT6UPkThxjxNQTfvXnUlr3M/MpKoT3Je0bR4o
RsjoSZncMQY=
=BLwx
-----END PGP SIGNATURE-----



More information about the pybsddb mailing list