[pybsddb] Batch import is slowing down
amirouche at hypermove.net
Wed Jun 24 12:11:38 CEST 2015
On 2015-06-24 12:09, Amirouche Boubekki wrote:
> Héllo everybody,
> On 2015-06-21 19:37, Jesus Cea wrote:
>> I really recommend everybody to read the Oracle Berkeley DB
>> documentation. It is really good. Berkeley DB is very flexible but
>> flexibility means that you need to learn about the inner working and
>> details of implementation. Skipping this will be frustrating. Invest
>> some time reading the docs.
> Indeed the documentation is very good, you did a very good job with
> Some things just went through or I don't remember reading about them.
> I've setup checkpoints and log file removal. Right now my script is
> only checkpointing at the
> end of the batch load of one file. The dataset is split over several
> files of 800M.
> With syncless transaction is quite faster. But I noticed that the
> loading of data is slowing
> down over the course of one file:
> - The first set of 10 000 entries took 3 seconds to load.
> - 49th set took 2 minutes
Here is the configuration I use regarding memory and logs:
- 3G of memory
- 4G max of memory
Log size max is set to 1G
More information about the pybsddb