[pybsddb] Making calls to keys() more efficient (or, how to do inequality gets on keys?)
lasizoillo
lasizoillo at gmail.com
Thu Jun 10 10:05:32 CEST 2010
2010/6/9 Denis Papathanasiou <denis.papathanasiou at gmail.com>:
> I know that using keys() can be inefficient, as per the warnings in the
> docs.
The docs says:
Warning: this method traverses the entire database so it can possibly
take a long time to complete.
possibly take long time != inefficient
But is a interesting warning.
If you read all database with read locks:
* You are blocking all writing operarion for a *possible long time*
* You can exhaust the lock space.
Relax ACID schema can cause data corruption. Full ACID schema can
cause *possible long time* delays.
>
> I do need to calls keys() in the following scenario, though, so I was
> wondering if there was anything I could do to make it run faster:
>
> A db that stores temporary session data has as its key a guid, prefixed by a
> unix timestamp. The associated value is just another guid identifier (which
> is unrelated to the one in the key).
>
> Since the session data is temporary, I have a function which examines all
> the keys, and removes every key/value pair older than n seconds (it's a
> simple calc using the unix timestamp prefix in the key).
>
> I realize that I could restructure the db so that the unix timestamp is part
> of the value, and by adding a secondary index on it, I could get all the
> keys older than n.
>
> But, restructuring it and doing all the tests will take time, so in the
> interim, is there anything I can do to make using keys() more effective?
>
Take all keys => O(n)
Take a key in btree => O(log n)
> Alternatively, is there any way to simply get all the keys greater than n?
>
Using cursors.
> It seems beyond the scope of bdb, but I was curious if I'd missed anything.
Take a look to the bdb reference guide:
http://www.oracle.com/technology/documentation/berkeley-db/db/programmer_reference/index.html
You can search for LRU papers for more implementations details.
Maybe you can use memcached (an LRU) and not bdb (a key-value store).
Regards,
Javi
More information about the pybsddb
mailing list