[pybsddb] How to manage logs
Amirouche Boubekki
amirouche at hypermove.net
Thu Jun 18 16:15:10 CEST 2015
On 2015-06-18 16:02, Lauren Foutz wrote:
> On 6/18/2015 9:48 AM, Amirouche Boubekki wrote:
>> On 2015-06-18 14:26, Lauren Foutz wrote:
>>> If the environment and database are transaction enabled, then every
>>> operation will use transactions, regardless of whether you create one
>>> explicitly. BDB will create a transaction internally and commit it
>>> when the operation finishes, or abort it on an error.
>>
>> It will create one transaction per operation (get, put, delete).
>
> Yes.
>
>> Does it provide any speed over using transaction explicitly?
>
> No, it tends to be slower since each commit requires that the logs be
> flushed to disk. It is better to use an explicit transaction, and use
> the same transaction over multiple put/get/delete operations.
So:
- I should use larger transactions instead.
- or never use transaction for a given database
>>
>> Is a database created *without* transaction compatible with opening it
>> later *with* transactions?
>
> Short answer, no, a data base needs to either always support
> transactions, or never support transactions. Long answer, if you use
> the function DB_ENV->lsn_reset() to reset the log number in each of
> the database files, and then delete the environment files (those that
> start with __db), you may be able to re-open the databases in a new
> transactionally enabled environment. But I am not certain that will
> work. The safe bet is to either have the database always support
> transactions, or never support transactions.
>
>>
>>>
>>> As for how to reduce the number of logs. Using DB_LOG_AUTO_REMOVE is
>>> a good start, but it will not remove logs until you run a checkpoint.
>>> So I recommend you execute a checkpoint at regular intervals while
>>> loading data into the databases.
>>
>> Ok! that's what I was missing.
I made the changes, it looks better for now.
>>>
>>> Also, you should removing the comment getting rid of DB_INIT_LOG in
>>> flags,
>>
>> ok
>>
>>> and also add the flag DB_INIT_LOCK.
>>
>> I don't need locks it's single threaded, no?
>
> Transactions assumes locks are used, regardless if whether the program
> is single threaded or not. Using transactions without locks can lead
> to undefined behavior such as a program crash due to accessing
> uninitialized memory.
Ok thanks for your quick responses.
>
> Lauren Foutz
>
>>
>> Best regards,
>>
>>>
>>> Lauren Foutz
>>>
>>> On 6/18/2015 5:58 AM, Amirouche Boubekki wrote:
>>>> Héllo,
>>>>
>>>>
>>>> I'm loading a dataset (conceptnet5) into Ajgu Db [1] backed by
>>>> pybsddb3 '6.0.1' and Berkeley DB 5.3.21.
>>>>
>>>> The problem I have is that even when I'm not using transactions
>>>> (passing txn=None) my database fills the disk with log files. There
>>>> is 2.3 Go of database files (including __db.* files) out of 429 Go
>>>> total disk space used by the database directory (du -h .).
>>>>
>>>> How can I remove those log files during the import of the database.
>>>> Right now the script can't even finish the loading of the first file
>>>> of the dataset.
>>>>
>>>> My db environment is configured as follow
>>>>
>>>> ```
>>>> # init bsddb3
>>>> self._env = DBEnv()
>>>> self._env.set_cache_max(*max_cache_size)
>>>> self._env.set_cachesize(*cache_size)
>>>> flags = (
>>>> DB_CREATE
>>>> # | DB_INIT_LOG
>>>> | DB_INIT_TXN
>>>> | DB_INIT_MPOOL
>>>> )
>>>> self._env.set_flags(DB_LOG_AUTO_REMOVE, True)
>>>> self._env.open(
>>>> str(self._path),
>>>> flags,
>>>> 0
>>>> )
>>>> ```
>>>> https://git.framasoft.org/python-graphiti-love-story/AjguGraphDB/blob/f8bf004ee132ac21fcbbb1c925889a16f1d5388d/ajgu/storage.py#L62
>>>> Every single store is created with the following function
>>>>
>>>> ```
>>>> # create vertices and edges k/v stores
>>>> def new_store(name, method):
>>>> txn = self._txn()
>>>> flags = DB_CREATE
>>>> elements = DB(self._env)
>>>> elements.open(
>>>> name,
>>>> None,
>>>> method,
>>>> flags,
>>>> 0,
>>>> txn=txn._txn
>>>> )
>>>> txn.commit()
>>>> return elements
>>>> ```
>>>>
>>>>
>>>>
>>>> [1] https://git.framasoft.org/python-graphiti-love-story/AjguGraphDB
>>>>
>>>>
>>>> Regards,
>>>>
>>>
>>> _______________________________________________
>>> pybsddb mailing list
>>> pybsddb at jcea.es
>>> https://mailman.jcea.es/listinfo/pybsddb
>>> http://www.jcea.es/programacion/pybsddb.htm
>>
>
> _______________________________________________
> pybsddb mailing list
> pybsddb at jcea.es
> https://mailman.jcea.es/listinfo/pybsddb
> http://www.jcea.es/programacion/pybsddb.htm
--
Amirouche ~ amz3 ~ http://www.hyperdev.fr
More information about the pybsddb
mailing list