[pybsddb] Batch import is slowing down

Amirouche Boubekki amirouche at hypermove.net
Thu Jun 25 17:46:50 CEST 2015


On 2015-06-25 13:33, Amirouche Boubekki wrote:
> On 2015-06-25 01:44, Jesus Cea wrote:
>> Disclosure: This is a mailing list about python bindings for Berkeley
>> DB, not about Berkeley DB performance. Moreover, I do professional
>> consulting work for this kind of performance "anomalies" :). Both
>> Berkeley DB and Python itself.
>> 
>> That said, let's try some analysis.
> 
> Ok, thanks. I'm just a free software and graphdb afficionados and I'm
> not making any money on this. They are a few downloads. The part that
> is difficult in the library is bsddb; So I'll probably forward people
> to you if they have any issues.
> 
>> 
>> I don't see anything suspicious in your "db_stats -m" dumps. The pages
>> touched grows linearly (kind of) with the number of nodes added, as
>> expected.
>> 
>> Before keeping digging in the Berkeley DB, lets discard that you are
>> being hitting a "classical" garbage collection anomaly in Python. Tell
>> me what do you do with the node objects in python when you finish
>> loading it.
> 
> 
> - Do you keep the data in RAM?.
> 
> No. I don't do something like `vertices.append(vertex)`.
> 
>> Do your structures contains circular references?
> 
> They are some. Should I add some weakref?
> 
>> Could you add this at the beginning of your python code?
>> 
>>   import gc
>>   gc.disable()
> 
> It helps a little bit, between 10% and 20%.

Well, I added weakref.ref objects. I won't take much more of your time 
especially since this still early alpha software.
  Thanks a lot for you guidance.



More information about the pybsddb mailing list