[pybsddb] Batch import is slowing down

Thu Jun 25 01:44:49 CEST 2015

Disclosure: This is a mailing list about python bindings for Berkeley
DB, not about Berkeley DB performance. Moreover, I do professional
consulting work for this kind of performance "anomalies" :). Both
Berkeley DB and Python itself.

That said, let's try some analysis.

I don't see anything suspicious in your "db_stats -m" dumps. The pages
touched grows linearly (kind of) with the number of nodes added, as
expected.

Before keeping digging in the Berkeley DB, lets discard that you are
being hitting a "classical" garbage collection anomaly in Python. Tell
me what do you do with the node objects in python when you finish
loading it. Do you keep the data in RAM?. Do your structures contains
circular references?

Could you add this at the beginning of your python code?

  import gc
  gc.disable()

-- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.jcea.es/pipermail/pybsddb/attachments/20150625/6c055c22/attachment.asc>