[pybsddb] How to manage logs

Lauren Foutz lauren.foutz at oracle.com
Thu Jun 18 16:02:26 CEST 2015


On 6/18/2015 9:48 AM, Amirouche Boubekki wrote:
> On 2015-06-18 14:26, Lauren Foutz wrote:
>> If the environment and database are transaction enabled, then every
>> operation will use transactions, regardless of whether you create one
>> explicitly.  BDB will create a transaction internally and commit it
>> when the operation finishes, or abort it on an error.
>
> It will create one transaction per operation (get, put, delete).

Yes.

> Does it provide any speed over using transaction explicitly?

No, it tends to be slower since each commit requires that the logs be 
flushed to disk.  It is better to use an explicit transaction, and use 
the same transaction over multiple put/get/delete operations.

>
> Is a database created *without* transaction compatible with opening it 
> later *with* transactions?

Short answer, no, a data base needs to either always support 
transactions, or never support transactions.  Long answer, if you use 
the function DB_ENV->lsn_reset() to reset the log number in each of the 
database files, and then  delete the environment files (those that start 
with __db), you may be able to re-open the databases in a new 
transactionally enabled environment.  But I am not certain that will 
work.  The safe bet is to either have the database always support 
transactions, or never support transactions.

>
>>
>> As for how to reduce the number of logs.  Using DB_LOG_AUTO_REMOVE is
>> a good start, but it will not remove logs until you run a checkpoint.
>> So I recommend you execute a checkpoint at regular intervals while
>> loading data into the databases.
>
> Ok! that's what I was missing.
>
>>
>> Also, you should removing the comment getting rid of DB_INIT_LOG in
>> flags,
>
> ok
>
>> and also add the flag DB_INIT_LOCK.
>
> I don't need locks it's single threaded, no?

Transactions assumes locks are used, regardless if whether the program 
is single threaded or not.  Using transactions without locks can lead to 
undefined behavior such as a program crash due to accessing 
uninitialized memory.

Lauren Foutz

>
> Best regards,
>
>>
>> Lauren Foutz
>>
>> On 6/18/2015 5:58 AM, Amirouche Boubekki wrote:
>>> Héllo,
>>>
>>>
>>> I'm loading a dataset (conceptnet5) into Ajgu Db [1] backed by 
>>> pybsddb3 '6.0.1' and Berkeley DB 5.3.21.
>>>
>>> The problem I have is that even when I'm not using transactions 
>>> (passing txn=None) my database fills the disk with log files. There 
>>> is 2.3 Go of database files (including __db.* files) out of 429 Go 
>>> total disk space used by the database directory (du -h .).
>>>
>>> How can I remove those log files during the import of the database. 
>>> Right now the script can't even finish the loading of the first file 
>>> of the dataset.
>>>
>>> My db environment is configured as follow
>>>
>>> ```
>>>         # init bsddb3
>>>         self._env = DBEnv()
>>>         self._env.set_cache_max(*max_cache_size)
>>>         self._env.set_cachesize(*cache_size)
>>>         flags = (
>>>             DB_CREATE
>>>             # | DB_INIT_LOG
>>>             | DB_INIT_TXN
>>>             | DB_INIT_MPOOL
>>>         )
>>>         self._env.set_flags(DB_LOG_AUTO_REMOVE, True)
>>>         self._env.open(
>>>             str(self._path),
>>>             flags,
>>>             0
>>>         )
>>> ```
>>> https://git.framasoft.org/python-graphiti-love-story/AjguGraphDB/blob/f8bf004ee132ac21fcbbb1c925889a16f1d5388d/ajgu/storage.py#L62 
>>> Every single store is created with the following function
>>>
>>> ```
>>>         # create vertices and edges k/v stores
>>>         def new_store(name, method):
>>>             txn = self._txn()
>>>             flags = DB_CREATE
>>>             elements = DB(self._env)
>>>             elements.open(
>>>                 name,
>>>                 None,
>>>                 method,
>>>                 flags,
>>>                 0,
>>>                 txn=txn._txn
>>>             )
>>>             txn.commit()
>>>             return elements
>>> ```
>>>
>>>
>>>
>>> [1] https://git.framasoft.org/python-graphiti-love-story/AjguGraphDB
>>>
>>>
>>> Regards,
>>>
>>
>> _______________________________________________
>> pybsddb mailing list
>> pybsddb at jcea.es
>> https://mailman.jcea.es/listinfo/pybsddb
>> http://www.jcea.es/programacion/pybsddb.htm
>



More information about the pybsddb mailing list