[pybsddb] Making calls to keys() more efficient (or, how to do inequality gets on keys?)

lasizoillo lasizoillo at gmail.com
Thu Jun 10 10:05:32 CEST 2010


2010/6/9 Denis Papathanasiou <denis.papathanasiou at gmail.com>:
> I know that using keys() can be inefficient, as per the warnings in the
> docs.

The docs says:
Warning: this method traverses the entire database so it can possibly
take a long time to complete.

possibly take long time != inefficient

But is a interesting warning.

If you read all database with read locks:
 * You are blocking all writing operarion for a *possible long time*
 * You can exhaust the lock space.

Relax ACID schema can cause data corruption. Full ACID schema can
cause *possible long time* delays.

>
> I do need to calls keys() in the following scenario, though, so I was
> wondering if there was anything I could do to make it run faster:
>
> A db that stores temporary session data has as its key a guid, prefixed by a
> unix timestamp. The associated value is just another guid identifier (which
> is unrelated to the one in the key).
>
> Since the session data is temporary, I have a function which examines all
> the keys, and removes every key/value pair older than n seconds (it's a
> simple calc using the unix timestamp prefix in the key).
>
> I realize that I could restructure the db so that the unix timestamp is part
> of the value, and by adding a secondary index on it, I could get all the
> keys older than n.
>
> But, restructuring it and doing all the tests will take time, so in the
> interim, is there anything I can do to make using keys() more effective?
>

Take all keys => O(n)
Take a key in btree => O(log n)

> Alternatively, is there any way to simply get all the keys greater than n?
>

Using cursors.

> It seems beyond the scope of bdb, but I was curious if I'd missed anything.

Take a look to the bdb reference guide:
http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/index.html

You can search for LRU papers for more implementations details.

Maybe you can use memcached (an LRU) and not bdb (a key-value store).

Regards,

Javi



More information about the pybsddb mailing list