With SOAP, which has an architectural foundation in the HTTP protocol, there are literally dozens of commercial and open source tools that you can use to perform load and scalability tests.
What you want to do is make sure that any tool you choose meets the technical needs for the right use of your interface, matches your skills at home and contains the reports you need to determine for success or failure in your testing. If you get a tool that is a mismatch on any of the three fronts, you can also purchase the most expensive tool on the market and hire the most expensive consultation form to use them ... sort or how to manage nails using the back of a screwdriver instead of using the right tool (a hammer).
Most commercial tools on the market today have rental / rental options for short-term use, and then there are open source options. Each tool has a designed efficiency associated with basic tasks, such as script construction, building a test, running a test, and analysis, which is different from the tool. Commercial tools, as a rule, have a more balanced weight for all tasks, while open source, as a rule, a higher level of LsOE is required to perform boundary-value tasks of creating and analyzing a script.
Given that you are going to work with at least several million samples (provided that you need to work for at least 24 hours), you need to make sure that the analysis tools have an obvious record with big data sets. Long-standing commercial performance testing tools have obvious track records at this level, open source hit and miss, and in some cases the analysis becomes your own suggestion against recorded response time data. You can see where you can record a lot of time by creating your own analysis engine here!
What you want to do is technically possible. You can review your performance requirements, that's why. I work with an organization that today uses a web services interface to serve the needs of customers around the world. Their backend archive of transactional data is approaching 250 TB of data with more than a decade of work. On an hourly basis, over the past year the high water mark was about 60 thousand requests per hour. Predicted in 24 hours, it still works up to less than 2 million requests per day. If you test this level and you find problems, you discover genuine problems or find technical ghosts, things that will never happen in production due to differences in production and volume of tests. Properly modeling your workload is always difficult, but the time spent modeling the workload for your combination of transactions and the right amount of time will be spent not to use your developerโs skills to chase performance ghosts and burn the budget.
Good luck
source share