LuceneIndexProviderService copyonread/write running into too much open file handles

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

LuceneIndexProviderService copyonread/write running into too much open file handles

Dirk Rudolph
Hi,

we recently faced the issue that our Oak based enterprise content management system run into failures due to too much open files. Monitoring the lsof output we found out that most of the opened files of the process are the files within the configured localIndexDir of the LuceneIndexProviderService. We have copyonread and copyonwrite enabled.

Are there any know limitations with handling open files related to those 2 options? If so, I naively would expect the implementation to manage file handles following kind of a LRU pattern and to allow configuring a maximum amount of file handles to use.

Talking in numbers we have, after a fresh restart of the process about 20k open files, 13k are index files, 2.5k segmentstore and most of the others jar files. The ulimit is already set to more then 65k but the instance crashed with more then 75k open file handles.

Many thanks in advance,

/Dirk


Reply | Threaded
Open this post in threaded view
|

Re: LuceneIndexProviderService copyonread/write running into too much open file handles

chetan mehrotra
On Tue, Mar 28, 2017 at 7:30 PM, Dirk Rudolph
<[hidden email]> wrote:
> Monitoring the lsof output we found out that most of the opened files of the process are the files within the configured localIndexDir of the LuceneIndexProviderService.

Lucene can open lots of file handle. Would it be possible for you to
open an issue and attach lsof output (possibly scrubbing private
info). Want to see the file handle pattern

Chetan Mehrotra
Reply | Threaded
Open this post in threaded view
|

Re: LuceneIndexProviderService copyonread/write running into too much open file handles

Clay Ferguson
In reply to this post by Dirk Rudolph
This type of problem, is most often caused by failing to put close() calls
in 'finally blocks'.  All resource management in Java always needs to
handled in finally blocks, or else resource leaks can easily bring a server
down from memory or handles issues. There may be places in Oak source that
isn't doing this, IDK. Of course the other possibility Dirk is that your
own code may not even be calling close() on everything it needs to.

Also same is true for session.logout(). Make sure you aren't holding open a
bunch of sessions at once (like on multiple threads, or just from failing
to call logout() in some loop that does processing).  And finally make sure
when you process large batches of operations that you don't build up too
many changes before doing a commit. session.save(). I have heard of apps
having problems in the past simply by trying to do too much in the same
"commit".


Best regards,
Clay Ferguson
[hidden email]


On Tue, Mar 28, 2017 at 9:00 AM, Dirk Rudolph <[hidden email]>
wrote:

> Hi,
>
> we recently faced the issue that our Oak based enterprise content
> management system run into failures due to too much open files. Monitoring
> the lsof output we found out that most of the opened files of the process
> are the files within the configured localIndexDir of the
> LuceneIndexProviderService. We have copyonread and copyonwrite enabled.
>
> Are there any know limitations with handling open files related to those 2
> options? If so, I naively would expect the implementation to manage file
> handles following kind of a LRU pattern and to allow configuring a maximum
> amount of file handles to use.
>
> Talking in numbers we have, after a fresh restart of the process about 20k
> open files, 13k are index files, 2.5k segmentstore and most of the others
> jar files. The ulimit is already set to more then 65k but the instance
> crashed with more then 75k open file handles.
>
> Many thanks in advance,
>
> /Dirk
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: LuceneIndexProviderService copyonread/write running into too much open file handles

Dirk Rudolph
Issue created with the lsof output attached here:

https://issues.apache.org/jira/browse/OAK-5995 <https://issues.apache.org/jira/browse/OAK-5995>

Thanks for the input.

As far as we know, we properly close sessions where ever we open them. Additionally I don’t understand why open sessions should keep index files open? Nor do we handle those files ourself, or even accessing classes of oak-lucene from our codebase. So from my perspective its not the application code causing that, though it might be the enterprise content management system.

Cheers
Dirk Rudolph | Senior Software Engineer

Netcentric AG

M: +41 79 642 07 99

[hidden email] <mailto:[hidden email]> | www.netcentric.biz <http://www.netcentric.biz/>

> On 28 Mar 2017, at 16:20, Clay Ferguson <[hidden email]> wrote:
>
> This type of problem, is most often caused by failing to put close() calls
> in 'finally blocks'.  All resource management in Java always needs to
> handled in finally blocks, or else resource leaks can easily bring a server
> down from memory or handles issues. There may be places in Oak source that
> isn't doing this, IDK. Of course the other possibility Dirk is that your
> own code may not even be calling close() on everything it needs to.
>
> Also same is true for session.logout(). Make sure you aren't holding open a
> bunch of sessions at once (like on multiple threads, or just from failing
> to call logout() in some loop that does processing).  And finally make sure
> when you process large batches of operations that you don't build up too
> many changes before doing a commit. session.save(). I have heard of apps
> having problems in the past simply by trying to do too much in the same
> "commit".
>
>
> Best regards,
> Clay Ferguson
> [hidden email]
>
>
> On Tue, Mar 28, 2017 at 9:00 AM, Dirk Rudolph <[hidden email]>
> wrote:
>
>> Hi,
>>
>> we recently faced the issue that our Oak based enterprise content
>> management system run into failures due to too much open files. Monitoring
>> the lsof output we found out that most of the opened files of the process
>> are the files within the configured localIndexDir of the
>> LuceneIndexProviderService. We have copyonread and copyonwrite enabled.
>>
>> Are there any know limitations with handling open files related to those 2
>> options? If so, I naively would expect the implementation to manage file
>> handles following kind of a LRU pattern and to allow configuring a maximum
>> amount of file handles to use.
>>
>> Talking in numbers we have, after a fresh restart of the process about 20k
>> open files, 13k are index files, 2.5k segmentstore and most of the others
>> jar files. The ulimit is already set to more then 65k but the instance
>> crashed with more then 75k open file handles.
>>
>> Many thanks in advance,
>>
>> /Dirk
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: LuceneIndexProviderService copyonread/write running into too much open file handles

Clay Ferguson
Dirk,
Based on all the open indexes, it may be that you are accidentally opening
multiple repository connections when you really only need one for the
entire app (I"m just guessing). Shutdown can be tricky, heres code from my
own project:

<goog_1236315601>
https://github.com/Clay-Ferguson/meta64/blob/master/src/main/java/com/meta64/mobile/repo/OakRepository.java

Note these:

executor.shutdown();
nodeStore.dispose();
indexProvider.close() <-----------
repository.shutdown()
mongo.close()

a lotta stuff going on just to shutdown.

Also the Runtime.getRuntime().addShutdownHook ensures it can never be
bypassed.


Best regards,
Clay Ferguson
[hidden email]


On Tue, Mar 28, 2017 at 9:32 AM, Dirk Rudolph <[hidden email]>
wrote:

> Issue created with the lsof output attached here:
>
> https://issues.apache.org/jira/browse/OAK-5995 <https://issues.apache.org/
> jira/browse/OAK-5995>
>
> Thanks for the input.
>
> As far as we know, we properly close sessions where ever we open them.
> Additionally I don’t understand why open sessions should keep index files
> open? Nor do we handle those files ourself, or even accessing classes of
> oak-lucene from our codebase. So from my perspective its not the
> application code causing that, though it might be the enterprise content
> management system.
>
> Cheers
> Dirk Rudolph | Senior Software Engineer
>
> Netcentric AG
>
> M: +41 79 642 07 99
>
> [hidden email] <mailto:[hidden email]> |
> www.netcentric.biz <http://www.netcentric.biz/>
> > On 28 Mar 2017, at 16:20, Clay Ferguson <[hidden email]> wrote:
> >
> > This type of problem, is most often caused by failing to put close()
> calls
> > in 'finally blocks'.  All resource management in Java always needs to
> > handled in finally blocks, or else resource leaks can easily bring a
> server
> > down from memory or handles issues. There may be places in Oak source
> that
> > isn't doing this, IDK. Of course the other possibility Dirk is that your
> > own code may not even be calling close() on everything it needs to.
> >
> > Also same is true for session.logout(). Make sure you aren't holding
> open a
> > bunch of sessions at once (like on multiple threads, or just from failing
> > to call logout() in some loop that does processing).  And finally make
> sure
> > when you process large batches of operations that you don't build up too
> > many changes before doing a commit. session.save(). I have heard of apps
> > having problems in the past simply by trying to do too much in the same
> > "commit".
> >
> >
> > Best regards,
> > Clay Ferguson
> > [hidden email]
> >
> >
> > On Tue, Mar 28, 2017 at 9:00 AM, Dirk Rudolph <
> [hidden email]>
> > wrote:
> >
> >> Hi,
> >>
> >> we recently faced the issue that our Oak based enterprise content
> >> management system run into failures due to too much open files.
> Monitoring
> >> the lsof output we found out that most of the opened files of the
> process
> >> are the files within the configured localIndexDir of the
> >> LuceneIndexProviderService. We have copyonread and copyonwrite enabled.
> >>
> >> Are there any know limitations with handling open files related to
> those 2
> >> options? If so, I naively would expect the implementation to manage file
> >> handles following kind of a LRU pattern and to allow configuring a
> maximum
> >> amount of file handles to use.
> >>
> >> Talking in numbers we have, after a fresh restart of the process about
> 20k
> >> open files, 13k are index files, 2.5k segmentstore and most of the
> others
> >> jar files. The ulimit is already set to more then 65k but the instance
> >> crashed with more then 75k open file handles.
> >>
> >> Many thanks in advance,
> >>
> >> /Dirk
> >>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: LuceneIndexProviderService copyonread/write running into too much open file handles

Dirk Rudolph
This is managed by the product we use.

Thanks, Dirk

Clay Ferguson <[hidden email]> schrieb am Di. 28. März 2017 um 17:14:

> Dirk,
> Based on all the open indexes, it may be that you are accidentally opening
> multiple repository connections when you really only need one for the
> entire app (I"m just guessing). Shutdown can be tricky, heres code from my
> own project:
>
> <goog_1236315601>
>
> https://github.com/Clay-Ferguson/meta64/blob/master/src/main/java/com/meta64/mobile/repo/OakRepository.java
>
> Note these:
>
> executor.shutdown();
> nodeStore.dispose();
> indexProvider.close() <-----------
> repository.shutdown()
> mongo.close()
>
> a lotta stuff going on just to shutdown.
>
> Also the Runtime.getRuntime().addShutdownHook ensures it can never be
> bypassed.
>
>
> Best regards,
> Clay Ferguson
> [hidden email]
>
>
> On Tue, Mar 28, 2017 at 9:32 AM, Dirk Rudolph <[hidden email]
> >
> wrote:
>
> > Issue created with the lsof output attached here:
> >
> > https://issues.apache.org/jira/browse/OAK-5995 <
> https://issues.apache.org/
> > jira/browse/OAK-5995>
> >
> > Thanks for the input.
> >
> > As far as we know, we properly close sessions where ever we open them.
> > Additionally I don’t understand why open sessions should keep index files
> > open? Nor do we handle those files ourself, or even accessing classes of
> > oak-lucene from our codebase. So from my perspective its not the
> > application code causing that, though it might be the enterprise content
> > management system.
> >
> > Cheers
> > Dirk Rudolph | Senior Software Engineer
> >
> > Netcentric AG
> >
> > M: +41 79 642 07 99
> >
> > [hidden email] <mailto:[hidden email]> |
> > www.netcentric.biz <http://www.netcentric.biz/>
> > > On 28 Mar 2017, at 16:20, Clay Ferguson <[hidden email]> wrote:
> > >
> > > This type of problem, is most often caused by failing to put close()
> > calls
> > > in 'finally blocks'.  All resource management in Java always needs to
> > > handled in finally blocks, or else resource leaks can easily bring a
> > server
> > > down from memory or handles issues. There may be places in Oak source
> > that
> > > isn't doing this, IDK. Of course the other possibility Dirk is that
> your
> > > own code may not even be calling close() on everything it needs to.
> > >
> > > Also same is true for session.logout(). Make sure you aren't holding
> > open a
> > > bunch of sessions at once (like on multiple threads, or just from
> failing
> > > to call logout() in some loop that does processing).  And finally make
> > sure
> > > when you process large batches of operations that you don't build up
> too
> > > many changes before doing a commit. session.save(). I have heard of
> apps
> > > having problems in the past simply by trying to do too much in the same
> > > "commit".
> > >
> > >
> > > Best regards,
> > > Clay Ferguson
> > > [hidden email]
> > >
> > >
> > > On Tue, Mar 28, 2017 at 9:00 AM, Dirk Rudolph <
> > [hidden email]>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> we recently faced the issue that our Oak based enterprise content
> > >> management system run into failures due to too much open files.
> > Monitoring
> > >> the lsof output we found out that most of the opened files of the
> > process
> > >> are the files within the configured localIndexDir of the
> > >> LuceneIndexProviderService. We have copyonread and copyonwrite
> enabled.
> > >>
> > >> Are there any know limitations with handling open files related to
> > those 2
> > >> options? If so, I naively would expect the implementation to manage
> file
> > >> handles following kind of a LRU pattern and to allow configuring a
> > maximum
> > >> amount of file handles to use.
> > >>
> > >> Talking in numbers we have, after a fresh restart of the process about
> > 20k
> > >> open files, 13k are index files, 2.5k segmentstore and most of the
> > others
> > >> jar files. The ulimit is already set to more then 65k but the instance
> > >> crashed with more then 75k open file handles.
> > >>
> > >> Many thanks in advance,
> > >>
> > >> /Dirk
> > >>
> > >>
> > >>
> >
> >
>
--

Dirk Rudolph | Senior Software Engineer

Netcentric AG

M: +41 79 642 37 11
D: +49 174 966 84 34

[hidden email] | www.netcentric.biz
Reply | Threaded
Open this post in threaded view
|

Re: LuceneIndexProviderService copyonread/write running into too much open file handles

chetan mehrotra
In reply to this post by Dirk Rudolph
On Tue, Mar 28, 2017 at 8:02 PM, Dirk Rudolph
<[hidden email]> wrote:
> Issue created with the lsof output attached here:
>
> https://issues.apache.org/jira/browse/OAK-5995 <https://issues.apache.org/jira/browse/OAK-5995>

Would follow up on the issue then

Chetan Mehrotra