Concurrrent use of Repositories

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Concurrrent use of Repositories

Michael Harrison
We are just starting to use Oak (version 1.6.1) backed by a file store. We have two concurrent processes trying to access the data in the file store, each process using its own Repository interface. As a simple test case we have the following. Process1 has this code (imports omitted to save space):


public class HoldSessionOpen {

    private static String workspace = "default";
    private static String repodir = "\\tmp\\oak-repo";

    public static void main(String[] args) throws Exception {
        FileStore filestore = FileStoreBuilder.fileStoreBuilder(new File(repodir)).build();
        try {
            SegmentNodeStore sns = SegmentNodeStoreBuilders.builder(filestore).build();
            Repository repository = new Jcr(new Oak(sns)).createRepository();
            SimpleCredentials creds = new SimpleCredentials("admin", "admin".toCharArray());
            Session session = repository.login(creds, workspace);
            try {
                System.out.println("Holding session: " + session);
                Node root = session.getRootNode();
                for (int i = 0; i < 100000; i++) {
                    if (root.hasNode("hold")) {
                        Node hold = root.getNode("hold");
                        long count = hold.getProperty("count").getLong();
                        hold.setProperty("count", count + 1);
                        System.out.println("found the hold node, count = " + count);
                    } else {
                        System.out.println("creating the hold node");
                        root.addNode("hold").setProperty("count", 1);
                    }
                }
                session.save();
            } finally {
                session.logout();
            }
        } finally {
            filestore.flush();
            filestore.close();
        }
    }
}


and process2 has this code:


public class AccessRepo {

    private static String workspace = "default";
    private static String repodir = "\\tmp\\oak-repo";

    public static void main(String[] args) throws Exception {
        FileStore filestore = FileStoreBuilder.fileStoreBuilder(new File(repodir)).build();
        try {
            SegmentNodeStore sns = SegmentNodeStoreBuilders.builder(filestore).build();
            Repository repository = new Jcr(new Oak(sns)).createRepository();
            SimpleCredentials creds = new SimpleCredentials("admin", "admin".toCharArray());
            Session session = repository.login(creds, workspace);
            try {
                Node root = session.getRootNode();
                if (root.hasNode("hello")) {
                    Node hello = root.getNode("hello");
                    long count = hello.getProperty("count").getLong();
                    hello.setProperty("count", count + 1);
                    System.out.println("found the hello node, count = " + count);
                } else {
                    System.out.println("creating the hello node");
                    root.addNode("hello").setProperty("count", 1);
                }
                session.save();
            } finally {
                session.logout();
            }
        } finally {
            filestore.flush();
            filestore.close();
        }
    }
}

If I open up two consoles and start process1 in one and then process2 in the other, process2 is blocked until process1 terminates.


The actual use case is that one process is responding to REST requests for data in the repository while another process is uploading content to the same repository, but we see the same behavior - the process that starts second is blocked until the process that starts first completes.


Is there any way to have concurrent access to the same data through two repositories?


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrrent use of Repositories

chetan mehrotra
On Tue, Jun 6, 2017 at 10:56 AM, Michael Harrison
<[hidden email]> wrote:
> Is there any way to have concurrent access to the same data through two repositories?

SegmentNodeStore only supports single node setups. For multiple node
setup aka cluster you would  need to use DocumentNodeStore connected
to Mongo or some database backed by DataStore

Chetan Mehrotra
Reply | Threaded
Open this post in threaded view
|

Re: Concurrrent use of Repositories

Michael Harrison
Chetan,


Thank you for your prompt response.


I'm not quite sure of the terminology, but I think we have only one node. All the data are stored in one repository in one file directory on one machine (for our initial implementation).


This is the use case. We have a REST UI that provides access to digital asset data in the repository. Users can search or browse for assets, move them to different categories, change metadata or the assets themselves, and so on. They can also initiate a bulk upload of digital asset data from an FTP site to the repository, for which we set up and execute a separate job. The API supporting the UI has a Repository object for accessing the data, and the upload job has its own Repository object for the same purpose. Both use the same Repository and Session protocol as shown in the example code. The upload job blocks trying to create its Repository, as I have described.


We are new to Oak and are still blundering about in the dark trying to figure out how to use it properly. Could you suggest the best approach to support our use case?


Mike Harrison

________________________________
From: Chetan Mehrotra <[hidden email]>
Sent: Monday, June 5, 2017 11:08:43 PM
To: [hidden email]
Subject: Re: Concurrrent use of Repositories

On Tue, Jun 6, 2017 at 10:56 AM, Michael Harrison
<[hidden email]> wrote:
> Is there any way to have concurrent access to the same data through two repositories?

SegmentNodeStore only supports single node setups. For multiple node
setup aka cluster you would  need to use DocumentNodeStore connected
to Mongo or some database backed by DataStore

Chetan Mehrotra

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Reply | Threaded
Open this post in threaded view
|

Re: Concurrrent use of Repositories

Ioan Eugen Stan
Hello Chetan,


Apache Oak stores data in Stores [1] . NodeStore - keeps data on disk
(your case), while document store uses either a Mongo DB, a RDBMS
(PostgreSQL for example) or in memory (no persistence).

Only Mongo, RDBMS and memory can be used from different processes.

Having that said, your choices are:

1. Keep using the file based repo and move the bulk upload job as part
of you application using a scheduling / remote trigger mechanism.

2. Use a Mongo/RDBMS Document Store and use the repository from multiple
applications at the same time.


On page [2] you can see how to create a NodeStore and a Document Store.
Search the oak code base (tests and oak-bechmarks module) to find examples.

Good luck,


[1] https://jackrabbit.apache.org/oak/docs/nodestore/overview.html

[2] https://jackrabbit.apache.org/oak/docs/construct.html


On 06.06.2017 17:40, Michael Harrison wrote:

> Chetan,
>
>
> Thank you for your prompt response.
>
>
> I'm not quite sure of the terminology, but I think we have only one node. All the data are stored in one repository in one file directory on one machine (for our initial implementation).
>
>
> This is the use case. We have a REST UI that provides access to digital asset data in the repository. Users can search or browse for assets, move them to different categories, change metadata or the assets themselves, and so on. They can also initiate a bulk upload of digital asset data from an FTP site to the repository, for which we set up and execute a separate job. The API supporting the UI has a Repository object for accessing the data, and the upload job has its own Repository object for the same purpose. Both use the same Repository and Session protocol as shown in the example code. The upload job blocks trying to create its Repository, as I have described.
>
>
> We are new to Oak and are still blundering about in the dark trying to figure out how to use it properly. Could you suggest the best approach to support our use case?
>
>
> Mike Harrison
>
> ________________________________
> From: Chetan Mehrotra <[hidden email]>
> Sent: Monday, June 5, 2017 11:08:43 PM
> To: [hidden email]
> Subject: Re: Concurrrent use of Repositories
>
> On Tue, Jun 6, 2017 at 10:56 AM, Michael Harrison
> <[hidden email]> wrote:
>> Is there any way to have concurrent access to the same data through two repositories?
> SegmentNodeStore only supports single node setups. For multiple node
> setup aka cluster you would  need to use DocumentNodeStore connected
> to Mongo or some database backed by DataStore
>
> Chetan Mehrotra
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
>


signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Concurrrent use of Repositories

chetan mehrotra
In reply to this post by Michael Harrison
On Tue, Jun 6, 2017 at 8:10 PM, Michael Harrison
<[hidden email]> wrote:
> The API supporting the UI has a Repository object for accessing the data, and the upload job has its own Repository object for the same purpose. Both use the same Repository and Session protocol as shown in the example code

As Ioan also mentioned SegmentNodeStore only supports access from a
single process. So if you want to use it then have the logic to handle
upload also run from the same process to keep setup simple.

Chetan Mehrotra