[pybsddb] Data corruption bug in latest bsddb3/berkeleydb packages

Jesus Cea jcea at jcea.es
Wed May 7 00:51:56 CEST 2025


On 6/5/25 23:00, Jacob Henner wrote:
> I believe I've encountered a data corruption bug in the latest bsddb3 
> and berkeleydb packages. It's possible that it also affects other 
> versions, but I've only tested with the latest.

Please, for your test code use berkeleydb. bsddb3 is legacy and 
unsupported by now.

That said, your code doesn't check anything at all. What am I suppose to 
see there?. What data corruption are you seeing?

Your description is a bit confusing and your code doesn't check anything 
at all, but reading between lines I kind of understand that after you 
write data to a database you check the file binary (via "raw" tools, not 
the DB interface) and you find content unrelated to the data your 
stored. Is that is the case, it is not "data corruption" but "data leakage".

Please, confirm that is the case in order for you and me to be in the 
same page.

Have you tried to reproduce this using the Berkeley DB C interface?

Berkeley DB C library and my python bindings in fact reuse buffers and I 
find no strange that when writing a partial page the not overwritten 
portion keep the old content. That stale data is present in the file, 
but not accesible via DB calls. This is not different that "deleting" a 
file and still seeing its content if you examine the raw blocks on disk. 
That is how "undelete" worked in the old days.

> [This code][1] writes a BerkeleyDB database to a temporary file, and 
> then manipulates that file using bsddb3/berkeleydb. When examining the 
> output, I've noticed that it includes unexpected data. The unexpected 
> data is recognizable as data that the program reads in other sections, 
> but it's data that is not accessible within the scope of the code 
> manipulating the database file. This leads me to believe there might be 
> a buffer/memory management issue within the native extension.

Just buffer reuse. Business as usual.

If your database, for example, contains a million registers and you 
delete them, you see no data using the DB interface but you will see all 
that old content in the raw database file. That is the way things work [*].

[*] A zero filling for free space would be an option, and maybe it could 
be a configuration option, but if the problem is that an attacker can 
see "stale" data examining the raw file, she could have done the same 
using the regular DB interface while that data was live and available.

It would be dangerous if the stale data you see were not previously 
present in the database. Leaking application unrelated data would be 
actually quite ugly.

So, my question would be:

Is the "stale" data you are seeing data previously present in the 
database or data leaked from the application, completely unrelated to 
the DB?. This is the critical distinction we must determine.

In the first case, that is expected. In the second case, we have a problem.

Please, be precise with your words: data corruption means that you store 
some data and read something different. That would be bad. Data leak 
means that you are seeing data in the binary file, not using the DB 
interface, that is not suppose to be there. That can be bad (if leaking 
data unrelated to the DB) or harmless (leftovers from deletions/page 
splitting/whatever).

Try this with any other database, for instance, sqlite. You will be 
surprised :-)

If you are actually seeing leaked non DB application memory, that would 
be serious. Is that the case?

If you DON'T see application private memory leaks, but you still thinks 
this is a problem, feel free to insist in this mailing list.

Thanks for reaching out.

-- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - https://www.jcea.es/    _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.jcea.es/pipermail/pybsddb/attachments/20250507/59180341/attachment.bin>


More information about the pybsddb mailing list