[pybsddb] Batch import is slowing down

Wed Jun 24 12:11:38 CEST 2015

On 2015-06-24 12:09, Amirouche Boubekki wrote:
> Héllo everybody,
> 
> 
> On 2015-06-21 19:37, Jesus Cea wrote:
>> I really recommend everybody to read the Oracle Berkeley DB
>> documentation. It is really good. Berkeley DB is very flexible but 
>> that
>> flexibility means that you need to learn about the inner working and
>> details of implementation. Skipping this will be frustrating. Invest
>> some time reading the docs.
>> 
>> <https://docs.oracle.com/cd/E17076_04/html/programmer_reference/index.html>.
> 
> Indeed the documentation is very good, you did a very good job with 
> that.
> Some things just went through or I don't remember reading about them.
> 
> I've setup checkpoints and log file removal. Right now my script is
> only checkpointing at the
> end of the batch load of one file. The dataset is split over several
> files of 800M.
> 
> With syncless transaction is quite faster. But I noticed that the
> loading of data is slowing
> down over the course of one file:
> 
> - The first set of 10 000 entries took 3 seconds to load.
> - 49th set took 2 minutes
> 

Here is the configuration I use regarding memory and logs:

- 3G of memory
- 4G max of memory

Log size max is set to 1G