[pybsddb] The BTREE comparison function get's truncated data

Francisco Olarte folarte at peoplecall.com
Thu Sep 12 09:52:08 CEST 2013


Hi:

I'm not sure if this is what is happening there, but it's common to use
truncated keys in intermediate nodes of btree data structures, as they are
enough to know which of the subnodes a key goes in, and also to eliminate a
common prefix if all of them share it. In fact Berkeley DB manual states:

>>>> from
http://docs.oracle.com/cd/E17076_03/html/programmer_reference/bt_conf.html#am_conf_bt_prefix
Btree prefix comparison

The Berkeley DB Btree implementation maximizes the number of keys that can
be stored on an internal page by storing only as many bytes of each key as
are necessary to distinguish it from adjacent keys. The prefix comparison
routine is what determines this minimum number of bytes (that is, the
length of the unique prefix), that must be stored. A prefix comparison
function for the Btree can be specified by calling
DB->set_bt_prefix()<http://docs.oracle.com/cd/E17076_03/html/api_reference/C/dbset_bt_prefix.html>
.

<<<<

Reading on it tells you something more about this. Overall I think that you
cannot use pickled keys if you use a comparison function which unpickles
them, or that you need to set a subkey extraction funcion which states it
needs the full key to distinguish them ( as, If I recall correctly, bsddb
is using one appropiate for strings ).

As an aside, I do not remember what is exactly in a datetime, but to use
timestamps as keys in a string-oriented btree it's normally best to
translate them to something like a java milliseconds values. Make them an
unsigned long integer offset from an adequeate value, you can use whatever
is appropiate for your app, like the classic unich epoch or 01-01-0000,
using whatever precission is good for you, and store them big-endian. This
gives you a nice small key with nice prefixes to compress and can improve
the performance of your database.

Regards.
   Francisco Olarte.





On Thu, Sep 12, 2013 at 5:35 AM, xcorat <xcorat at gmail.com> wrote:

> I'm pretty sure there are no null bytes, I tried the same using a string
> of a floating point number.
>
> the comparison function gets,
>
> '13789346362'
>
> instead of supposed
>
> '1378934636286548.8'
>
> I will post a test case soon.
>
> Thank you,
> Sachi
>
>
>
> On Wed, Sep 11, 2013 at 7:48 PM, Jesus Cea <jcea at jcea.es> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 11/09/13 21:12, xcorat wrote:
>> > Hi
>> >
>> > I'm not sure where to find the problem. I'm saving pickled
>> > (cPickle) datetime keys in the database, and set the comparison
>> > function to load those keys and compare. When I call the set_range
>> > function, the right_key value the comparison function gets is
>> > **sometimes** truncated. Left key (the one I send) is always fine.
>> > The data in the database is fine too.
>> >
>> > Ex. Left key
>> >
>> "datetime\ndatetime\np1\n(S'\\x07\\xdd\\x08\\x1f\\x17\\x1f\\x1f\\x0f\\x10\\x02'\ntRp2\n."
>> >
>> >  right key (truncated)
>> > "cdatetime\ndatetime\np1\n(S'\\x07\\xdd\\t\\x04\\x1642\\x"
>> >
>> > Do you know why? any fixes?
>>
>> Could you possibly verify if the truncated data has any "\0" on it
>> (null byte)?. Could you post a testcase?.
>>
>> - --
>> Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
>> jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
>> Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
>> jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
>> "Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
>> "My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
>> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.10 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iQCVAwUBUjErjZlgi5GaxT1NAQLwYgP/UCSwGlj6WtarxhX5taQCMLzWf7twvd6X
>> RGZoEgow2o9YiMWxYbFqDvnDuvgkzUnGqlCj4C0KBmbg059aOTe4VoEFGlJv8eSp
>> P3jng/xoFPmW1iAhIsZgEXdeH3+X/LQuEyXgayUnkA+JhZcaYp47Kz8Svm+osB9P
>> QH1Xm1JxGe4=
>> =zHvB
>> -----END PGP SIGNATURE-----
>> _______________________________________________
>> pybsddb mailing list
>> pybsddb at jcea.es
>> https://mailman.jcea.es/listinfo/pybsddb
>> http://www.jcea.es/programacion/pybsddb.htm
>
>
>
>
> --
> Xcorat :)
>
> _______________________________________________
> pybsddb mailing list
> pybsddb at jcea.es
> https://mailman.jcea.es/listinfo/pybsddb
> http://www.jcea.es/programacion/pybsddb.htm
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jcea.es/pipermail/pybsddb/attachments/20130912/53f709f4/attachment.html>


More information about the pybsddb mailing list