[pybsddb] Scanning Secondary Databases with Unknown Values

Wed Sep 23 16:41:52 CEST 2009

I have a db with a secondary database on a date field, which is a
string representation of unix time.

Based on my prior exploration of this topic
(http://mailman.argo.es/pipermail/pybsddb/2008-December/000132.html),
I'm using the following function to find all keys greater than a
specific time (for this example, the conversion function is something
that takes the date string and transforms it into a number, so the
inequality can work):

def _greater_than (db_con, *args):
    """Lookup and return a list of data item keys which are greater
than or equal to the given value"""
    val = args[0]
    attribute = args[1]
    conversion_fn = args[2]
    amount = conversion_fn(val)
    data = []
    if db_con and secondary_indices.has_key(attribute):
        cur = secondary_indices[attribute].cursor()
        rec = cur.pget(val, db.DB_SET_RANGE) # rec is in 3 parts:
(indexed value, key, value)
        if rec:
            data.append(rec[1])
        while rec:
            rec = cur.pget(val, db.DB_NEXT)
            if not rec:
                break
            if conversion_fn(rec[0]) >= amount:
                data.append(rec[1])
        cur.close()
    return filter(None, data)

The function works if the val argument passed is "0", b/c then the
cur.pget(val, db.DB_SET_RANGE) simply starts at the first item, then
iterates forward.

Similarly, if val happens to match an actual date value of one of the
items (e.g., in my data, there is an item whose date value is
"1253174409"), then the function works as expected.

But the majority of the calls to this function are using values which
do not match any of the items, and in that case, the call to pget()
returns None, and the function fails (i.e., returns no keys when in
fact there were items whose date was later than val).

Is there any way around this problem?