[pybsddb] Data corruption bug in latest bsddb3/berkeleydb packages

Jacob Henner JacobHenner at outlook.com
Tue May 6 23:00:44 CEST 2025


Hello,

I believe I've encountered a data corruption bug in the latest bsddb3 and berkeleydb packages. It's possible that it also affects other versions, but I've only tested with the latest.

[This code][1] writes a BerkeleyDB database to a temporary file, and then manipulates that file using bsddb3/berkeleydb. When examining the output, I've noticed that it includes unexpected data. The unexpected data is recognizable as data that the program reads in other sections, but it's data that is not accessible within the scope of the code manipulating the database file. This leads me to believe there might be a buffer/memory management issue within the native extension.

If the code is called multiple times, the corrupt data varies significantly.

The program does not use any concurrency features directly (e.g. no asyncio, no threading, no multiprocessing).

This is happening with Python 3.12 on Linux, using libdb 5.3.28.

Since file reads/writes are at play here, I've experimented with several different methods of reading/writing the relevant file(s) to assess whether incorrect file access patterns could be the cause. In all cases the same issue appeared.

[1]: https://gist.github.com/JacobHenner/09e77341603958f5c872a7317ebd2e71
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jcea.es/pipermail/pybsddb/attachments/20250506/eaa117a1/attachment.htm>


More information about the pybsddb mailing list