In a simplified form, my Java application can be described as follows:
This is a web application running on a Tomcat SOAP server. The application uses JPA / Hibernate to store data in a MySQL database. The stored data consists of a list of users, a list of hosts and a list of URIs pointing to huge files (10 GB) in the file system. The whole system consists of a central server, on which my application runs, and a group of working hosts. The user can connect to the SOAP interface and ask the system to copy the files belonging to him to a specific work node, where he can somehow analyze the data (we cannot use NFS, we need to copy the data to the local storage of the working host). Then the database is stored for each user on which his file is stored.
Currently, the system works with one central server with the Tomcat application and the MySQL database, as well as with 10 working hosts and about 30 users who have 100 files (10 GB on average) that are stored on the working nodes.
But in the future I have to scale the system 100-1000 times. Therefore, I may have to deal with 10,000 users, 100,000 files, and 10,000 hosts. And the system should also become fault tolerant, so I do not have a single central server (which is now the only point of failure in the system), but there may be several. In addition, if one of the work nodes fails, the system must be notified, so it does not try to copy files on this server.
Now my question is: what Java technologies could I use to make the application scalable and fault tolerant? What architecture would you recommend? Should I have a huge database that stores all the information about all files, hosts and users in the system in one place, or is it better for me to distribute my database on several hosts and somehow synchronize them?
java scalability redundancy
asmaier
source share