Implementing write permissions

I study indexing engines, in particular Apache Lucene Solr. We are ready to use it for our searches, but one of the problems solved by our search within the framework is access at the row level.

Solr does not provide write access from the box:

<...> Solr does not apply to security both at the document level and at the communication level.

And in the section on document-level security: http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

There are several suggestions - either use Manifold CF (which is very undocumented and seems to be in pre-beta testing), or write your own request handler / component (this part is marked as a stub) - I think that a later version alone will have a greater impact on performance.

Therefore, I suppose that not much has been done in this field.

In the recently released version of Solr 4.0, they introduced a union of two indexed objects. Joining may seem like a good idea, as our system also makes a connection to find out if a record is available to the user. The problem here is that sometimes we make an internal connection, and sometimes an external one (depending on optimism (everything that is forbidden) or pessimistic (everything is forbidden only what is explicitly allowed) in the security settings).

To better understand what our structure looks like:

<strong> Documents

DocumentNr | Name ------------------ 1 | Foo 2 | Bar 

DocumentRecordAccess

 DocumentNr | UserNr | AllowRead | AllowUpdate | AllowDelete ------------------------------------------------------------ 1 | 1 | 1 | 1 | 0 

So, for example, the generated query for the parameter "Document parameters in pessimistic mode":

 SELECT * FROM Documents AS d INNER JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1 

This will return only foo, but not the panel. And in the optimistic mode:

 SELECT * FROM Documents AS d LEFT JOIN DocumentRecordAccess AS dra ON dra.DocumentNr=d.DocumentNr AND dra.AllowRead=1 AND dra.UserNr=1 

Returning both - Foo and Bar.

Returning to my question - maybe someone has already done this and can share their understanding and experience?

+8
lucene solr
source share
2 answers

I am afraid that there is no easy solution. You will have to sacrifice something to make the ACL work with the search.

  • If the size of your enclosure is small (say, up to 10K documents), you can create a set of cached bits of prohibited (or allowed, depending on which is less detailed) documents and send the corresponding filter request (+*:* -DocumentNr:1 ... -DocumentNr:X) . Needless to say, this is not scalable. Sending large queries will make the search a little slower, but it is manageable (to the point, of course). Request analysis is cheap .

  • If you can somehow group these documents and apply ACLs in groups of documents, this will allow you to cut the length of the request, and the above approach will fit perfectly. This is pretty much what we are using - our solution implements a taxonomy and has taxonomy permissions made using the fq query.

  • If you do not need to show the total number of result sets, you can run your query and filter the result set on the client side. Again, not perfect.

  • You can also denormalize your data structures and keep both tables flattened in one document, like this:

    DocumentNr: 1
    Name: Foo
    Allowed_users: u1, u2, u3 (or Forbidden_users: ...)

    The rest is as simple as sending a user ID with your request.

    The above is only viable if the ACLs rarely change and you can afford to reindex the entire enclosure when they do.

  • You can write a custom query filter that should cache BitSet allowed or forbidden documents by a user (group?) BitSet from the database. This will require not only providing database access for Solr webapp, but also the extension / repackaging of .war, which comes with Solr. Although this is relatively easy, the more complex part would be an invalid cache : the main application must somehow signal the Solr application when the ACL data is changed.

Options 1 and 2 are probably more reasonable if you can put Solr and your application in the same JVM and use the javabin driver.

It is difficult to consult more without knowing the specifics of the enclosure / ACL.

+7
source share

I agree with mindas that he suggested (sol-4), I implemented my solution in the same way, but the difference is that I have several different types of ACLs. At the group level, user level, and even document level (private access).

The solution is working fine. But the main problem in my case is that ACLs change often and they need to be updated in the index, while search performance should not be affected either.

I am trying to manage this with load balancing and adding multiple nodes to the cluster.

mindas, unicron, could you talk about this?

+3
source share

All Articles