[pybsddb] How to manage logs

Amirouche Boubekki amirouche at hypermove.net
Thu Jun 18 15:48:06 CEST 2015


On 2015-06-18 14:26, Lauren Foutz wrote:
> If the environment and database are transaction enabled, then every
> operation will use transactions, regardless of whether you create one
> explicitly.  BDB will create a transaction internally and commit it
> when the operation finishes, or abort it on an error.

It will create one transaction per operation (get, put, delete). Does it 
provide any speed over using transaction explicitly?

Is a database created *without* transaction compatible with opening it 
later *with* transactions?

> 
> As for how to reduce the number of logs.  Using DB_LOG_AUTO_REMOVE is
> a good start, but it will not remove logs until you run a checkpoint.
> So I recommend you execute a checkpoint at regular intervals while
> loading data into the databases.

Ok! that's what I was missing.

> 
> Also, you should removing the comment getting rid of DB_INIT_LOG in
> flags,

ok

> and also add the flag DB_INIT_LOCK.

I don't need locks it's single threaded, no?

Best regards,

> 
> Lauren Foutz
> 
> On 6/18/2015 5:58 AM, Amirouche Boubekki wrote:
>> Héllo,
>> 
>> 
>> I'm loading a dataset (conceptnet5) into Ajgu Db [1] backed by 
>> pybsddb3 '6.0.1' and Berkeley DB 5.3.21.
>> 
>> The problem I have is that even when I'm not using transactions 
>> (passing txn=None) my database fills the disk with log files. There is 
>> 2.3 Go of database files (including __db.* files) out of 429 Go total 
>> disk space used by the database directory (du -h .).
>> 
>> How can I remove those log files during the import of the database. 
>> Right now the script can't even finish the loading of the first file 
>> of the dataset.
>> 
>> My db environment is configured as follow
>> 
>> ```
>>         # init bsddb3
>>         self._env = DBEnv()
>>         self._env.set_cache_max(*max_cache_size)
>>         self._env.set_cachesize(*cache_size)
>>         flags = (
>>             DB_CREATE
>>             # | DB_INIT_LOG
>>             | DB_INIT_TXN
>>             | DB_INIT_MPOOL
>>         )
>>         self._env.set_flags(DB_LOG_AUTO_REMOVE, True)
>>         self._env.open(
>>             str(self._path),
>>             flags,
>>             0
>>         )
>> ```
>> https://git.framasoft.org/python-graphiti-love-story/AjguGraphDB/blob/f8bf004ee132ac21fcbbb1c925889a16f1d5388d/ajgu/storage.py#L62 
>> Every single store is created with the following function
>> 
>> ```
>>         # create vertices and edges k/v stores
>>         def new_store(name, method):
>>             txn = self._txn()
>>             flags = DB_CREATE
>>             elements = DB(self._env)
>>             elements.open(
>>                 name,
>>                 None,
>>                 method,
>>                 flags,
>>                 0,
>>                 txn=txn._txn
>>             )
>>             txn.commit()
>>             return elements
>> ```
>> 
>> 
>> 
>> [1] https://git.framasoft.org/python-graphiti-love-story/AjguGraphDB
>> 
>> 
>> Regards,
>> 
> 
> _______________________________________________
> pybsddb mailing list
> pybsddb at jcea.es
> https://mailman.jcea.es/listinfo/pybsddb
> http://www.jcea.es/programacion/pybsddb.htm

-- 
Amirouche ~ amz3 ~ http://www.hyperdev.fr


More information about the pybsddb mailing list