Using Oak for distributed system

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Oak for distributed system

Ioan Eugen Stan
Hello,

I would like to ask for some opinions and get some directions about
using Apache Oak technology in our products.

Story bellow:

We are building a SaaS system to provide value to our line of business.
We have our platform up and running for some time and we are growing our
business.

However we need a sort of CMS as we need to be able to update content
(pages, texts in reports, legal documents like terms of use and privacy
policy) without making a new release. Also we have content that needs to
be translated into a lot of languages and we need to be able to
coordinate the translation process and sync all the i18n content between
our system and the translation platform.

I believe that a CMS based on Apache Oak is a very good solution to our
needs, at least on paper since we don't have any experience with Oak and
JCR in production systems.

I have attached a sample deployment diagram that I believe we should
migrate to. Right now we are missing the Sling CMS and Oak repositories
components. We are storing templates and content on disk and need to
make a new release to push updates. We can do that quite easy and fast
as we have an automated deployment pipeline, but the process will be
greatly improved by using Oak features and a CMS.

I have a lot of questions and I hope I can get some answers. Some of
them might be more appropriate on the Sling mailing list and I will ask
them there.

I hope this case will server as a nice use case for using Oak and I plan
to write some articles with the experience.

We have some services:

Sling CMS Portal - Main entry point in the App. Displays pages and
portal content. Read/Write access to some content in the repo.
API Service - API Gateway. Should be able to read/write the Oak repo.
API for the business rules.

Oak Repository - Store content like: email templates, account/user
logos, etc.
Report Rendering Service - Used by API to render PDF reports. Needs Read
only access to repository. Should work without connection to repository.


The amount of content we need to store currently in the Oak repository
is under 1GB, stored as files.


Questions (with numbers to be easy to refer to):

1. Can we access a single Oak repository from multiple nodes over
network? I believe so, but please confirm as the docs don't seem to make
this very clear.

2. How stable and how is the performance of RDBDocumentStore? We plan to
use it because we have skills and procedures for maintaining and backing
up PostgreSQL databases and I would prefer not to complicate the
deployment with MongoDB. The bulk of analitical data will be stored in a
separate PostgreSQL instance and not in JCR. We need JCR to display
pages and some content (email templates, report templates, terms of use
and legal pages, etc ). It will not have to support millions of hits per
day.

3. Can we have synced copies/repository?

We have a service that generates PDF reports: Report Rendering Service.
We would like to keep the templates and content in JCR but would like
the service to work without a constant connection to the Oak repository.

The content changes once a day, and Report Rendering Service should work
without a connection to the repository. However, there might be times
when we need to force the sync before we deploy a new version of the
service.

Secondary node seems like the best solution for this use case [1].
FileVault and ColdStandby features seem to allow for this use case as well.

3.a Can Secondary Node run disconnected?

3.b Can Secondary node work with RDBDocumentStore ?

3.c. In case we can sync the repository and some content is versionable,
can we select what version to use? I imagine we can with Secondary Node
solution.


[1]
https://jackrabbit.apache.org/oak/docs/nodestore/document/secondary-store.html

[2] https://jackrabbit.apache.org/oak/docs/coldstandby/coldstandby.html


Regards,

Ioan Eugen Stan



deployment-diagram.png (74K) Download Attachment
signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Using Oak for distributed system

Clay Ferguson
Ioan,
Just my 2 cents worth: Even if the JCR itself doesn't offer the advanced
'disconnected mode' you were looking for, I'd say probably neither do any
RDBMs solutions, so that even if you have to build some of this stuff you
need yourself (in house) then you are still better off building on top of
JCR's capabilities, rather than building on either plain MongoDB or RDBMS
or even other NoSQL solution, becasue the JCR does give you a huge head
start towards what you need as compared to any non-JCR thing.

Best regards,
Clay Ferguson
[hidden email]


On Thu, May 18, 2017 at 8:40 AM, Ioan Eugen Stan <[hidden email]> wrote:

> Hello,
>
> I would like to ask for some opinions and get some directions about
> using Apache Oak technology in our products.
>
> Story bellow:
>
> We are building a SaaS system to provide value to our line of business.
> We have our platform up and running for some time and we are growing our
> business.
>
> However we need a sort of CMS as we need to be able to update content
> (pages, texts in reports, legal documents like terms of use and privacy
> policy) without making a new release. Also we have content that needs to
> be translated into a lot of languages and we need to be able to
> coordinate the translation process and sync all the i18n content between
> our system and the translation platform.
>
> I believe that a CMS based on Apache Oak is a very good solution to our
> needs, at least on paper since we don't have any experience with Oak and
> JCR in production systems.
>
> I have attached a sample deployment diagram that I believe we should
> migrate to. Right now we are missing the Sling CMS and Oak repositories
> components. We are storing templates and content on disk and need to
> make a new release to push updates. We can do that quite easy and fast
> as we have an automated deployment pipeline, but the process will be
> greatly improved by using Oak features and a CMS.
>
> I have a lot of questions and I hope I can get some answers. Some of
> them might be more appropriate on the Sling mailing list and I will ask
> them there.
>
> I hope this case will server as a nice use case for using Oak and I plan
> to write some articles with the experience.
>
> We have some services:
>
> Sling CMS Portal - Main entry point in the App. Displays pages and
> portal content. Read/Write access to some content in the repo.
> API Service - API Gateway. Should be able to read/write the Oak repo.
> API for the business rules.
>
> Oak Repository - Store content like: email templates, account/user
> logos, etc.
> Report Rendering Service - Used by API to render PDF reports. Needs Read
> only access to repository. Should work without connection to repository.
>
>
> The amount of content we need to store currently in the Oak repository
> is under 1GB, stored as files.
>
>
> Questions (with numbers to be easy to refer to):
>
> 1. Can we access a single Oak repository from multiple nodes over
> network? I believe so, but please confirm as the docs don't seem to make
> this very clear.
>
> 2. How stable and how is the performance of RDBDocumentStore? We plan to
> use it because we have skills and procedures for maintaining and backing
> up PostgreSQL databases and I would prefer not to complicate the
> deployment with MongoDB. The bulk of analitical data will be stored in a
> separate PostgreSQL instance and not in JCR. We need JCR to display
> pages and some content (email templates, report templates, terms of use
> and legal pages, etc ). It will not have to support millions of hits per
> day.
>
> 3. Can we have synced copies/repository?
>
> We have a service that generates PDF reports: Report Rendering Service.
> We would like to keep the templates and content in JCR but would like
> the service to work without a constant connection to the Oak repository.
>
> The content changes once a day, and Report Rendering Service should work
> without a connection to the repository. However, there might be times
> when we need to force the sync before we deploy a new version of the
> service.
>
> Secondary node seems like the best solution for this use case [1].
> FileVault and ColdStandby features seem to allow for this use case as well.
>
> 3.a Can Secondary Node run disconnected?
>
> 3.b Can Secondary node work with RDBDocumentStore ?
>
> 3.c. In case we can sync the repository and some content is versionable,
> can we select what version to use? I imagine we can with Secondary Node
> solution.
>
>
> [1]
> https://jackrabbit.apache.org/oak/docs/nodestore/document/
> secondary-store.html
>
> [2] https://jackrabbit.apache.org/oak/docs/coldstandby/coldstandby.html
>
>
> Regards,
>
> Ioan Eugen Stan
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Using Oak for distributed system

chetan mehrotra
In reply to this post by Ioan Eugen Stan
Answers to few of the queries inline

> 1. Can we access a single Oak repository from multiple nodes over
> network? I believe so, but please confirm as the docs don't seem to make
> this very clear.

Yes just spin multiple Oak instances connected to same DB or Mongo

> 3. Can we have synced copies/repository?

Generally you would use sidegrade support for that [1] but thats not
incremental so would do a complete diff for each run. There are plans
to add secondary node store support to sidegrade which would allow
incremental update of secondary store using oak-upgrade tooling.

> 3.a Can Secondary Node run disconnected?

Yes its a normal SegmentNodeStore (with some added meta props). So can
be used in disconnected mode

> 3.b Can Secondary node work with RDBDocumentStore ?

Yes

> 3.c. In case we can sync the repository and some content is versionable,
> can we select what version to use? I imagine we can with Secondary Node
> solution.

For that you need to rely on JCR API. At Secondary Node level its all content

Chetan Mehrotra
[1] https://jackrabbit.apache.org/oak/docs/migration.html#Sidegrade
Reply | Threaded
Open this post in threaded view
|

Re: Using Oak for distributed system

Michael Marth-3
Hi,

Re 2: performance of any system is really quite dependent on the workload. I would advise not too take anyone’s word, but run tests that reflect your intended usage. The Oak project has a good set of performance tests that can get you started (sorry - currently offline, but IIRC it’s the “benchmark” package in oak-run.
If your final deployment scenario includes Sling you might just run an http performance tool against Sling.

Michael




On 18/05/17 16:37, "Chetan Mehrotra" <[hidden email]> wrote:

>Answers to few of the queries inline
>
>> 1. Can we access a single Oak repository from multiple nodes over
>> network? I believe so, but please confirm as the docs don't seem to make
>> this very clear.
>
>Yes just spin multiple Oak instances connected to same DB or Mongo
>
>> 3. Can we have synced copies/repository?
>
>Generally you would use sidegrade support for that [1] but thats not
>incremental so would do a complete diff for each run. There are plans
>to add secondary node store support to sidegrade which would allow
>incremental update of secondary store using oak-upgrade tooling.
>
>> 3.a Can Secondary Node run disconnected?
>
>Yes its a normal SegmentNodeStore (with some added meta props). So can
>be used in disconnected mode
>
>> 3.b Can Secondary node work with RDBDocumentStore ?
>
>Yes
>
>> 3.c. In case we can sync the repository and some content is versionable,
>> can we select what version to use? I imagine we can with Secondary Node
>> solution.
>
>For that you need to rely on JCR API. At Secondary Node level its all content
>
>Chetan Mehrotra
>[1] https://jackrabbit.apache.org/oak/docs/migration.html#Sidegrade
Reply | Threaded
Open this post in threaded view
|

Re: Using Oak for distributed system

Ioan Eugen Stan
Hi,

Thank you for all your great advice. I have started working on this and
should have some results in the following weeks.

I'll run the bechmarks and I'll try to make some time to post the
results online.

In the mean time I need to prepare the deployment. I decided to sue
Apache Karaf as a container and I was a bit surprised that I could not
find a feature for jackrabbit / jackrabbit oak. There usually are
features for a lot of applications I used so far.

Do you know if there is something like that available?


Regards,


On 23.05.2017 04:36, Michael Marth wrote:

> Hi,
>
> Re 2: performance of any system is really quite dependent on the workload. I would advise not too take anyone’s word, but run tests that reflect your intended usage. The Oak project has a good set of performance tests that can get you started (sorry - currently offline, but IIRC it’s the “benchmark” package in oak-run.
> If your final deployment scenario includes Sling you might just run an http performance tool against Sling.
>
> Michael
>
>
>
>
> On 18/05/17 16:37, "Chetan Mehrotra" <[hidden email]> wrote:
>
>> Answers to few of the queries inline
>>
>>> 1. Can we access a single Oak repository from multiple nodes over
>>> network? I believe so, but please confirm as the docs don't seem to make
>>> this very clear.
>> Yes just spin multiple Oak instances connected to same DB or Mongo
>>
>>> 3. Can we have synced copies/repository?
>> Generally you would use sidegrade support for that [1] but thats not
>> incremental so would do a complete diff for each run. There are plans
>> to add secondary node store support to sidegrade which would allow
>> incremental update of secondary store using oak-upgrade tooling.
>>
>>> 3.a Can Secondary Node run disconnected?
>> Yes its a normal SegmentNodeStore (with some added meta props). So can
>> be used in disconnected mode
>>
>>> 3.b Can Secondary node work with RDBDocumentStore ?
>> Yes
>>
>>> 3.c. In case we can sync the repository and some content is versionable,
>>> can we select what version to use? I imagine we can with Secondary Node
>>> solution.
>> For that you need to rely on JCR API. At Secondary Node level its all content
>>
>> Chetan Mehrotra
>> [1] https://jackrabbit.apache.org/oak/docs/migration.html#Sidegrade


signature.asc (499 bytes) Download Attachment