[pybsddb] Data corruption bug in latest bsddb3/berkeleydb packages
Jesus Cea
jcea at jcea.es
Wed May 7 00:51:56 CEST 2025
On 6/5/25 23:00, Jacob Henner wrote:
> I believe I've encountered a data corruption bug in the latest bsddb3
> and berkeleydb packages. It's possible that it also affects other
> versions, but I've only tested with the latest.
Please, for your test code use berkeleydb. bsddb3 is legacy and
unsupported by now.
That said, your code doesn't check anything at all. What am I suppose to
see there?. What data corruption are you seeing?
Your description is a bit confusing and your code doesn't check anything
at all, but reading between lines I kind of understand that after you
write data to a database you check the file binary (via "raw" tools, not
the DB interface) and you find content unrelated to the data your
stored. Is that is the case, it is not "data corruption" but "data leakage".
Please, confirm that is the case in order for you and me to be in the
same page.
Have you tried to reproduce this using the Berkeley DB C interface?
Berkeley DB C library and my python bindings in fact reuse buffers and I
find no strange that when writing a partial page the not overwritten
portion keep the old content. That stale data is present in the file,
but not accesible via DB calls. This is not different that "deleting" a
file and still seeing its content if you examine the raw blocks on disk.
That is how "undelete" worked in the old days.
> [This code][1] writes a BerkeleyDB database to a temporary file, and
> then manipulates that file using bsddb3/berkeleydb. When examining the
> output, I've noticed that it includes unexpected data. The unexpected
> data is recognizable as data that the program reads in other sections,
> but it's data that is not accessible within the scope of the code
> manipulating the database file. This leads me to believe there might be
> a buffer/memory management issue within the native extension.
Just buffer reuse. Business as usual.
If your database, for example, contains a million registers and you
delete them, you see no data using the DB interface but you will see all
that old content in the raw database file. That is the way things work [*].
[*] A zero filling for free space would be an option, and maybe it could
be a configuration option, but if the problem is that an attacker can
see "stale" data examining the raw file, she could have done the same
using the regular DB interface while that data was live and available.
It would be dangerous if the stale data you see were not previously
present in the database. Leaking application unrelated data would be
actually quite ugly.
So, my question would be:
Is the "stale" data you are seeing data previously present in the
database or data leaked from the application, completely unrelated to
the DB?. This is the critical distinction we must determine.
In the first case, that is expected. In the second case, we have a problem.
Please, be precise with your words: data corruption means that you store
some data and read something different. That would be bad. Data leak
means that you are seeing data in the binary file, not using the DB
interface, that is not suppose to be there. That can be bad (if leaking
data unrelated to the DB) or harmless (leftovers from deletions/page
splitting/whatever).
Try this with any other database, for instance, sqlite. You will be
surprised :-)
If you are actually seeing leaked non DB application memory, that would
be serious. Is that the case?
If you DON'T see application private memory leaks, but you still thinks
this is a problem, feel free to insist in this mailing list.
Thanks for reaching out.
--
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jcea at jcea.es - https://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
Twitter: @jcea _/_/ _/_/ _/_/_/_/_/
jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.jcea.es/pipermail/pybsddb/attachments/20250507/59180341/attachment.bin>
More information about the pybsddb
mailing list