Hadoop is more suitable for batch processing, which provides a high level of data access. You should take a look at some NoSQL systems, such as document-oriented databases. It is difficult to answer without knowing what your data is.
The number one rule for NoSQL design is to first define your query scripts. When you really understand how you want to query the data, you can look at various NoSQL solutions. The default distribution block is the key. Therefore, you need to remember that you need to be able to share your data between your node machines, otherwise you will get a horizontally scalable system, and all the work will be performed on one node (although the best requests depending on the case).
You also need to return to the CAP theorem, most NoSQL databases are ultimately consistent (CP or AP), while CA is the traditional relational DBMS. This will affect how you process the data and create certain things, for example, key generation can be complicated. Obviously, the files in the folder are a little different.
Also remember that on some systems, such as HBase, there is no indexing concept (I want you to have indexing files set up in this Windows FS document store). All your indexes must be created by your application logic, and any updates and deletes will need to be managed as such. With Mongo, you can create indexes on fields and query them relatively quickly, there is also the ability to integrate Solr with Mongo. You just donβt need to request an identifier in Mongo, like in HBase, which is a family of columns (like a Google BigTable style database) where you essentially have nested key-value pairs.
So, again, it is about your data, about what you want to save, about how you plan to store it, and, most importantly, about how you want to access it. The Lily project looks very promising. I am working to ensure that we take a large amount of data from the Internet, and we store, analyze, remove, parse, analyze, process, update, etc. We do not just use one system, but many that are best suited to work. For this process, we use different systems at different stages, because it gives us quick access where we need it, provides the ability to stream and analyze data in real time and, importantly, track everything as it moves (like data loss in the product the system is a big deal). I use Hadoop, HBase, Hive, MongoDB, Solr, MySQL and even good old text files. Remember that to produce a system using these technologies is a bit more complicated than installing Oracle on a server, some versions are not so stable, and you really need to test first. In the end, it really depends on the level of business resistance and the critical nature of your system.
Another way that no one has mentioned so far is NewSQL - that is, horizontally scalable RDBMS ... There are several such as MySQL-cluster (I think) and VoltDB that may suit your reason. But again, depending on your data (document file files or text documents with information about products, accounts or tools, or something else ...)
Again, to understand your data and access patterns, NoSQL systems are also Non-Rel, i.e. non-relational, and they are better suited for non-relational datasets. If your data is inherently relational, and you need some SQL query functions that really need to do things like Cartesian products (aka joins), then you might be better off sticking to Oracle and investing some time in indexing, tuning, and tuning performance.
My advice would be to play with several different systems. Take a look:
MongoDB - Document - CP
CouchDB - Document - AP
Cassandra - Column Family - Separation Available and Enabled (AP)
VoltDB is a really good product, a relationship database that is distributed and can work for your business (maybe easier). They also seem to provide corporate support that may be more appropriate for the product (that is, to give business users a sense of security).
In any case, this is my 2c. Playing with systems is the only way to find out what really works for your business.