[pybsddb] Making calls to keys() more efficient (or, how to do inequality gets on keys?)

Thu Jun 10 18:13:11 CEST 2010

> Maybe you can use memcached (an LRU) and not bdb (a key-value store).

FYI, memcached specifically limits the number of keys you can query from a
bucket in one go (to a 2MB response), and does not support range queries (on
keys), so it is often not possible to dump the entire cache's keys.

e.g. `stats cachedump`ing keys from the buckets of a table with 3643788
entries yields only 163452 keys

$ memcached-tool localhost dump | grep -a ^add | wc -l
Dumping memcache contents
  Number of buckets: 5
  Number of items  : 3643788
Dumping bucket 1 - 2248149 total items
Dumping bucket 2 - 1094868 total items
Dumping bucket 3 - 275682 total items
Dumping bucket 4 - 24911 total items
Dumping bucket 5 - 178 total items
163452

Cheers
  - Jeremy

On Thu, Jun 10, 2010 at 1:05 AM, lasizoillo <lasizoillo at gmail.com> wrote:

> 2010/6/9 Denis Papathanasiou <denis.papathanasiou at gmail.com>:
> > I know that using keys() can be inefficient, as per the warnings in the
> > docs.
>
> The docs says:
> Warning: this method traverses the entire database so it can possibly
> take a long time to complete.
>
> possibly take long time != inefficient
>
> But is a interesting warning.
>
> If you read all database with read locks:
>  * You are blocking all writing operarion for a *possible long time*
>  * You can exhaust the lock space.
>
> Relax ACID schema can cause data corruption. Full ACID schema can
> cause *possible long time* delays.
>
> >
> > I do need to calls keys() in the following scenario, though, so I was
> > wondering if there was anything I could do to make it run faster:
> >
> > A db that stores temporary session data has as its key a guid, prefixed
> by a
> > unix timestamp. The associated value is just another guid identifier
> (which
> > is unrelated to the one in the key).
> >
> > Since the session data is temporary, I have a function which examines all
> > the keys, and removes every key/value pair older than n seconds (it's a
> > simple calc using the unix timestamp prefix in the key).
> >
> > I realize that I could restructure the db so that the unix timestamp is
> part
> > of the value, and by adding a secondary index on it, I could get all the
> > keys older than n.
> >
> > But, restructuring it and doing all the tests will take time, so in the
> > interim, is there anything I can do to make using keys() more effective?
> >
>
> Take all keys => O(n)
> Take a key in btree => O(log n)
>
> > Alternatively, is there any way to simply get all the keys greater than
> n?
> >
>
> Using cursors.
>
> > It seems beyond the scope of bdb, but I was curious if I'd missed
> anything.
>
> Take a look to the bdb reference guide:
>
> http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/index.html
>
> You can search for LRU papers for more implementations details.
>
> Maybe you can use memcached (an LRU) and not bdb (a key-value store).
>
> Regards,
>
> Javi
> _______________________________________________
> pybsddb mailing list
> pybsddb at jcea.es
> https://mailman.jcea.es:28443/listinfo/pybsddb
> http://www.jcea.es/programacion/pybsddb.htm
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jcea.es/pipermail/pybsddb/attachments/20100610/be2221a9/attachment.htm>