[pybsddb] How to manage logs

Amirouche Boubekki amirouche at hypermove.net
Thu Jun 18 16:15:10 CEST 2015


On 2015-06-18 16:02, Lauren Foutz wrote:
> On 6/18/2015 9:48 AM, Amirouche Boubekki wrote:
>> On 2015-06-18 14:26, Lauren Foutz wrote:
>>> If the environment and database are transaction enabled, then every
>>> operation will use transactions, regardless of whether you create one
>>> explicitly.  BDB will create a transaction internally and commit it
>>> when the operation finishes, or abort it on an error.
>> 
>> It will create one transaction per operation (get, put, delete).
> 
> Yes.
> 
>> Does it provide any speed over using transaction explicitly?
> 
> No, it tends to be slower since each commit requires that the logs be
> flushed to disk.  It is better to use an explicit transaction, and use
> the same transaction over multiple put/get/delete operations.

So:

- I should use larger transactions instead.
- or never use transaction for a given database

>> 
>> Is a database created *without* transaction compatible with opening it 
>> later *with* transactions?
> 
> Short answer, no, a data base needs to either always support
> transactions, or never support transactions.  Long answer, if you use
> the function DB_ENV->lsn_reset() to reset the log number in each of
> the database files, and then  delete the environment files (those that
> start with __db), you may be able to re-open the databases in a new
> transactionally enabled environment.  But I am not certain that will
> work.  The safe bet is to either have the database always support
> transactions, or never support transactions.
> 
>> 
>>> 
>>> As for how to reduce the number of logs.  Using DB_LOG_AUTO_REMOVE is
>>> a good start, but it will not remove logs until you run a checkpoint.
>>> So I recommend you execute a checkpoint at regular intervals while
>>> loading data into the databases.
>> 
>> Ok! that's what I was missing.

I made the changes, it looks better for now.

>>> 
>>> Also, you should removing the comment getting rid of DB_INIT_LOG in
>>> flags,
>> 
>> ok
>> 
>>> and also add the flag DB_INIT_LOCK.
>> 
>> I don't need locks it's single threaded, no?
> 
> Transactions assumes locks are used, regardless if whether the program
> is single threaded or not.  Using transactions without locks can lead
> to undefined behavior such as a program crash due to accessing
> uninitialized memory.

Ok thanks for your quick responses.

> 
> Lauren Foutz
> 
>> 
>> Best regards,
>> 
>>> 
>>> Lauren Foutz
>>> 
>>> On 6/18/2015 5:58 AM, Amirouche Boubekki wrote:
>>>> Héllo,
>>>> 
>>>> 
>>>> I'm loading a dataset (conceptnet5) into Ajgu Db [1] backed by 
>>>> pybsddb3 '6.0.1' and Berkeley DB 5.3.21.
>>>> 
>>>> The problem I have is that even when I'm not using transactions 
>>>> (passing txn=None) my database fills the disk with log files. There 
>>>> is 2.3 Go of database files (including __db.* files) out of 429 Go 
>>>> total disk space used by the database directory (du -h .).
>>>> 
>>>> How can I remove those log files during the import of the database. 
>>>> Right now the script can't even finish the loading of the first file 
>>>> of the dataset.
>>>> 
>>>> My db environment is configured as follow
>>>> 
>>>> ```
>>>>         # init bsddb3
>>>>         self._env = DBEnv()
>>>>         self._env.set_cache_max(*max_cache_size)
>>>>         self._env.set_cachesize(*cache_size)
>>>>         flags = (
>>>>             DB_CREATE
>>>>             # | DB_INIT_LOG
>>>>             | DB_INIT_TXN
>>>>             | DB_INIT_MPOOL
>>>>         )
>>>>         self._env.set_flags(DB_LOG_AUTO_REMOVE, True)
>>>>         self._env.open(
>>>>             str(self._path),
>>>>             flags,
>>>>             0
>>>>         )
>>>> ```
>>>> https://git.framasoft.org/python-graphiti-love-story/AjguGraphDB/blob/f8bf004ee132ac21fcbbb1c925889a16f1d5388d/ajgu/storage.py#L62 
>>>> Every single store is created with the following function
>>>> 
>>>> ```
>>>>         # create vertices and edges k/v stores
>>>>         def new_store(name, method):
>>>>             txn = self._txn()
>>>>             flags = DB_CREATE
>>>>             elements = DB(self._env)
>>>>             elements.open(
>>>>                 name,
>>>>                 None,
>>>>                 method,
>>>>                 flags,
>>>>                 0,
>>>>                 txn=txn._txn
>>>>             )
>>>>             txn.commit()
>>>>             return elements
>>>> ```
>>>> 
>>>> 
>>>> 
>>>> [1] https://git.framasoft.org/python-graphiti-love-story/AjguGraphDB
>>>> 
>>>> 
>>>> Regards,
>>>> 
>>> 
>>> _______________________________________________
>>> pybsddb mailing list
>>> pybsddb at jcea.es
>>> https://mailman.jcea.es/listinfo/pybsddb
>>> http://www.jcea.es/programacion/pybsddb.htm
>> 
> 
> _______________________________________________
> pybsddb mailing list
> pybsddb at jcea.es
> https://mailman.jcea.es/listinfo/pybsddb
> http://www.jcea.es/programacion/pybsddb.htm

-- 
Amirouche ~ amz3 ~ http://www.hyperdev.fr


More information about the pybsddb mailing list