Choosing a Strategy for a BI Module

The company I work with creates a content management system (CMS) with various various add-ons for publishing, e-commerce, online printing, etc. We are now adding a ā€œreporting moduleā€, and I need to explore which strategy should be followed. The reporting module is otherwise known as Business Intelligence or BI.

It is assumed that the module will be able to track the loading of elements, perform a search and display various reports from it. In fact, it is not so important what data is being accumulated, because in the long run we may want to do what we think is necessary and get a report from it.

Roughly speaking, we have two options.

Option 1 - write a solution based on Apache Solr (in particular, using https://issues.apache.org/jira/browse/SOLR-236 ). The advantages of this approach:

  • free / open source / good quality
  • we use Solr / Lucene elsewhere, so we know the domain well.
  • full flexibility over what is indexed, since we can accept incoming data (in XML format), push it through XSLT and pass it to Solr
  • full flexibility of displaying search results. As in the previous step, we could create a custom XSLT search template and display the results in any format that we consider necessary.
  • our frontend developers own XSLT, so this mechanism for another client should be relatively easy.
  • Solr offers real-time search / full text / facet search, which we absolutely need. A quick prototype (based on Solr, 1M records) was able to deliver search results in 55 ms. Our estimated record maximum is about 1 billion rows (this is not so much for a regular BI application), and if it gets worse, we can always look at SolrCloud, etc.
  • There are companies that do very similar things using Solr (like Honeycomb Lexicon).

The disadvantages of this approach are:

  • SOLR-236 may or may not be stable, moreover, it is not yet clear when / if it will be released as part of an official release
  • there may be something we need to write in order to work with certain BI functions. It sounds a bit like reinventing the wheel.
  • The biggest problem is that we do not know what we may need in the future (for example, integration with some parts of BI software, export to Excel, etc.).

Option 2 is integration with some free or commercial piece of BI software. So far I have looked at Wabit and have a look at QlikView, possibly others. The advantages of this approach:

  • No need to reinvent the wheel, software (hopefully) tested and verified
  • will save us time that we could spend on solving the problems that we specialize in

Minuses:

  • since we are a Java store and our solution is cross-platform, we will have to eliminate many of the options that are on the market.
  • I'm not sure how flexible BI software can be. It will take time to browse through some BI sentences to see if they can perform flexible indexing, real-time / full-text search, fully customizable results, etc.
  • I was told that the open source BI offers are not mature enough, while commercial BI (SAP, others) cost a fortune, their licenses start at tens of thousands of pounds / dollars. Although I am not opposed to the commercial choice as such, it will correspond to the total price, which can easily become too large.
  • not sure how BI is designed to work with data without a schema

I am definitely not the best candidate to find the most suitable integration option on the market (mainly due to a lack of knowledge in the field of BI), but the decision must be completed quickly.

Has anyone been in a similar situation and could advise on which route to take or even better - advise on the possible pluses / minuses of option No. 2? The biggest problem here is that I don’t know what I don’t know;)

+4
source share
3 answers

I spent some time playing with QlikView and Wabit, and I must say, I am very disappointed.

I had the expectation that the entire BI industry really has some kind of science, but from what I found, it's just just a buzzword. This MSDN article was a visual discovery. The whole BI business is to receive data from well-normalized schemes (they call it OLTP), putting them in less normalized schemes (OLAP, snowflake or star-type) and creating indexes for every aspect you want (industry jargon for this is a data cube). The rest are just some scenarios to get beautiful graphics.

Well, I know that I am simplifying the situation here. I know that I may have missed a lot of different aspects (good reports? Export to Excel? Predictions?), But from the point of view of computer science, I just don’t see anything outside the database index.

I was told that some BI tools support compression. Lucene also supports this. I was told that some BI tools are capable of storing the entire index in memory. There is a Lucene cache for this.

Speaking of two candidates (Wabit and QlikView) - the first is just immature (I have dozens of exceptions when trying to go beyond what was suggested in their demo), while the other only works under Windows (not very good, but I could handle it), and the integration will probably require me to write VBScript (yuck!). I had to spend a couple of hours on the QlikView forums to access a simple date range controller because the Personal Edition I did not support the downloadable demo projects available on their website. Don’t get me wrong, both of them are good tools for what they were created for, but I just don’t see the point of integrating with them, since I wouldn’t get much.

To address Solr's infallibility, I will define an abstract API so that I can move all the data to a database that supports full-text queries if something goes wrong. And if it gets worse, I can always write material on top of Solr / Lucene if I need to.

+3
source

If you are really in a scenario where you are not sure what you do not know , I think that it is best to study the open source tool and evaluate its usefulness before plunging into your own implementation. It is possible that using an open source solution will help you further develop your understanding and the necessary functions.
I worked previously with open source software called Pentaho . I seriously felt that I understood a lot more by learning to use the Pentaho features for my end. Of course, as with most open source solutions, Pentaho seems a little scared at first, but I managed to get a good grip with it in a month. We also worked with the Kettle ETL tool and Mondrian cubes - which I think most of the serious BI tools these days are building on top.
Previously, all these components were independent, but late, I believe that Pentaho took responsibility for all these projects.

But as soon as you are sure of what you need and what not, I would suggest creating your own basic reporting tool in addition to the mondrian implementation. Setting up a sophisticated open source tool can really be a big problem. In addition, there are licenses to worry about. I believe Pentaho is a GPL, although you can check it out.

+1
source

You must first clearly indicate what your reports should show. What reporting function do you need? What output formats do you want? You want to display it in a browser (HTML) or in PDF format or using an interactive viewer (Java / Flash). Where is the data (database, Java, etc.) located? Do you need Ad-Hoc reporting or just some hard-coded reports? These are just some of the questions.

Without answers to this question, it is difficult to give a real recommendation, but my general recommendation would be i-net Clear Reports (usually called i-net Crystal-Clear). This is a Java tool. This is a commercial tool, but the cost is lower than SAP and co.

+1
source

All Articles