[pybsddb] Batch import is slowing down
Amirouche Boubekki
amirouche at hypermove.net
Thu Jun 25 17:46:50 CEST 2015
On 2015-06-25 13:33, Amirouche Boubekki wrote:
> On 2015-06-25 01:44, Jesus Cea wrote:
>> Disclosure: This is a mailing list about python bindings for Berkeley
>> DB, not about Berkeley DB performance. Moreover, I do professional
>> consulting work for this kind of performance "anomalies" :). Both
>> Berkeley DB and Python itself.
>>
>> That said, let's try some analysis.
>
> Ok, thanks. I'm just a free software and graphdb afficionados and I'm
> not making any money on this. They are a few downloads. The part that
> is difficult in the library is bsddb; So I'll probably forward people
> to you if they have any issues.
>
>>
>> I don't see anything suspicious in your "db_stats -m" dumps. The pages
>> touched grows linearly (kind of) with the number of nodes added, as
>> expected.
>>
>> Before keeping digging in the Berkeley DB, lets discard that you are
>> being hitting a "classical" garbage collection anomaly in Python. Tell
>> me what do you do with the node objects in python when you finish
>> loading it.
>
>
> - Do you keep the data in RAM?.
>
> No. I don't do something like `vertices.append(vertex)`.
>
>> Do your structures contains circular references?
>
> They are some. Should I add some weakref?
>
>> Could you add this at the beginning of your python code?
>>
>> import gc
>> gc.disable()
>
> It helps a little bit, between 10% and 20%.
Well, I added weakref.ref objects. I won't take much more of your time
especially since this still early alpha software.
Thanks a lot for you guidance.
More information about the pybsddb
mailing list