[pybsddb] Batch import is slowing down

Amirouche Boubekki amirouche at hypermove.net
Wed Jun 24 12:09:41 CEST 2015


Héllo everybody,


On 2015-06-21 19:37, Jesus Cea wrote:
> I really recommend everybody to read the Oracle Berkeley DB
> documentation. It is really good. Berkeley DB is very flexible but that
> flexibility means that you need to learn about the inner working and
> details of implementation. Skipping this will be frustrating. Invest
> some time reading the docs.
> 
> <https://docs.oracle.com/cd/E17076_04/html/programmer_reference/index.html>.

Indeed the documentation is very good, you did a very good job with 
that.
Some things just went through or I don't remember reading about them.

I've setup checkpoints and log file removal. Right now my script is only 
checkpointing at the
end of the batch load of one file. The dataset is split over several 
files of 800M.

With syncless transaction is quite faster. But I noticed that the 
loading of data is slowing
down over the course of one file:

- The first set of 10 000 entries took 3 seconds to load.
- 49th set took 2 minutes

Is this something that to be expected?

Thanks in advance,


Amirouche


More information about the pybsddb mailing list