[pybsddb] The BTREE comparison function get's truncated data

xcorat xcorat at gmail.com
Fri Sep 13 20:20:59 CEST 2013


Well, it turned out that I hadn't called set_bt_comparison when creating
the database. I did that, and now everything seem to work fine. It would
have been nicer if it had either given me a warning or an error instead of
giving bizarre data though.

Thank you,
Sachi.


On Thu, Sep 12, 2013 at 12:52 AM, Francisco Olarte
<folarte at peoplecall.com>wrote:

> Hi:
>
> I'm not sure if this is what is happening there, but it's common to use
> truncated keys in intermediate nodes of btree data structures, as they are
> enough to know which of the subnodes a key goes in, and also to eliminate a
> common prefix if all of them share it. In fact Berkeley DB manual states:
>
> >>>> from
> http://docs.oracle.com/cd/E17076_03/html/programmer_reference/bt_conf.html#am_conf_bt_prefix
> Btree prefix comparison
>
> The Berkeley DB Btree implementation maximizes the number of keys that can
> be stored on an internal page by storing only as many bytes of each key as
> are necessary to distinguish it from adjacent keys. The prefix comparison
> routine is what determines this minimum number of bytes (that is, the
> length of the unique prefix), that must be stored. A prefix comparison
> function for the Btree can be specified by calling DB->set_bt_prefix()<http://docs.oracle.com/cd/E17076_03/html/api_reference/C/dbset_bt_prefix.html>
> .
>
> <<<<
>
> Reading on it tells you something more about this. Overall I think that
> you cannot use pickled keys if you use a comparison function which
> unpickles them, or that you need to set a subkey extraction funcion which
> states it needs the full key to distinguish them ( as, If I recall
> correctly, bsddb is using one appropiate for strings ).
>
> As an aside, I do not remember what is exactly in a datetime, but to use
> timestamps as keys in a string-oriented btree it's normally best to
> translate them to something like a java milliseconds values. Make them an
> unsigned long integer offset from an adequeate value, you can use whatever
> is appropiate for your app, like the classic unich epoch or 01-01-0000,
> using whatever precission is good for you, and store them big-endian. This
> gives you a nice small key with nice prefixes to compress and can improve
> the performance of your database.
>
> Regards.
>    Francisco Olarte.
>
>
>
>
>
> On Thu, Sep 12, 2013 at 5:35 AM, xcorat <xcorat at gmail.com> wrote:
>
>> I'm pretty sure there are no null bytes, I tried the same using a string
>> of a floating point number.
>>
>> the comparison function gets,
>>
>> '13789346362'
>>
>> instead of supposed
>>
>> '1378934636286548.8'
>>
>> I will post a test case soon.
>>
>> Thank you,
>> Sachi
>>
>>
>>
>> On Wed, Sep 11, 2013 at 7:48 PM, Jesus Cea <jcea at jcea.es> wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> On 11/09/13 21:12, xcorat wrote:
>>> > Hi
>>> >
>>> > I'm not sure where to find the problem. I'm saving pickled
>>> > (cPickle) datetime keys in the database, and set the comparison
>>> > function to load those keys and compare. When I call the set_range
>>> > function, the right_key value the comparison function gets is
>>> > **sometimes** truncated. Left key (the one I send) is always fine.
>>> > The data in the database is fine too.
>>> >
>>> > Ex. Left key
>>> >
>>> "datetime\ndatetime\np1\n(S'\\x07\\xdd\\x08\\x1f\\x17\\x1f\\x1f\\x0f\\x10\\x02'\ntRp2\n."
>>> >
>>> >  right key (truncated)
>>> > "cdatetime\ndatetime\np1\n(S'\\x07\\xdd\\t\\x04\\x1642\\x"
>>> >
>>> > Do you know why? any fixes?
>>>
>>> Could you possibly verify if the truncated data has any "\0" on it
>>> (null byte)?. Could you post a testcase?.
>>>
>>> - --
>>> Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
>>> jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
>>> Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
>>> jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
>>> "Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
>>> "My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
>>> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.10 (GNU/Linux)
>>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>>
>>> iQCVAwUBUjErjZlgi5GaxT1NAQLwYgP/UCSwGlj6WtarxhX5taQCMLzWf7twvd6X
>>> RGZoEgow2o9YiMWxYbFqDvnDuvgkzUnGqlCj4C0KBmbg059aOTe4VoEFGlJv8eSp
>>> P3jng/xoFPmW1iAhIsZgEXdeH3+X/LQuEyXgayUnkA+JhZcaYp47Kz8Svm+osB9P
>>> QH1Xm1JxGe4=
>>> =zHvB
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> pybsddb mailing list
>>> pybsddb at jcea.es
>>> https://mailman.jcea.es/listinfo/pybsddb
>>> http://www.jcea.es/programacion/pybsddb.htm
>>
>>
>>
>>
>> --
>> Xcorat :)
>>
>> _______________________________________________
>> pybsddb mailing list
>> pybsddb at jcea.es
>> https://mailman.jcea.es/listinfo/pybsddb
>> http://www.jcea.es/programacion/pybsddb.htm
>>
>
>
> _______________________________________________
> pybsddb mailing list
> pybsddb at jcea.es
> https://mailman.jcea.es/listinfo/pybsddb
> http://www.jcea.es/programacion/pybsddb.htm
>



-- 
Xcorat :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jcea.es/pipermail/pybsddb/attachments/20130913/ae74a811/attachment.html>


More information about the pybsddb mailing list