What is the programming language for modeling an intensive trading portfolio?

I am creating a trading portfolio management system that is responsible for the production, optimization and modeling of portfolios of non-high-frequency trading operations (having 1 min or 3 million data bars, not tick data).

I plan to use Amazon web services to take over the entire load of the application.

I have four options that I consider as a language.

  • Java
  • C ++
  • FROM#
  • Python

Here is the scale of the extremes of the project area. This is not as it may be, ever, but in terms of requirements:

  • Weekly modeling of 10,000,000 trading systems.
  • (Each trading system is expected to have its own data mining methods, including item selection algorithms that are extremely computationally expensive. Imagine 500-5000 functions use wrappers. They don’t run often by any means, but this still consideration)
  • Real-time portfolio production with 100,000 trading strategies
  • Taking data for 1 minute or 3 minutes from each stock / futures market around the world (about 100,000).
  • Portfolio portfolio optimization with up to 100,000 strategies. (pretty intense algorithm)

Speed ​​is a problem, but I believe Java can handle the load.

I just want to make sure Java can handle the above requirements comfortably. I do not want to do a C ++ project, but I will if necessary.

The reason C # is because I thought it was a good alternative to Java, although I don't like Windows at all and prefer Java if everything is the same.

Python - I read something about PyPy and pyscho which claim that python can be optimized with JIT compilation to work at speeds close to C. This is almost the only reason this list applies, except that Python is a great language and is likely to be the most enjoyable language for coding, which is not a factor for this project, but it is a perk.

Summarizing:

  • real time production
  • weekly simulations of a large number of systems
  • weekly / monthly portfolio optimization
  • a large number of connections for collecting data from

There are no deals with millisecond or even second deals. The only consideration is that Java can deal with such a load when propagating from the required number of EC2 servers.

Thank you guys for your wisdom.

+6
java python trading
source share
7 answers

I would choose Java for this task. Regarding RAM, the difference between Java and C ++ is that in Java, every object has an overhead of 8 bytes (using a 32-bit Sun JVM or a 64-bit Sun JVM with compressed pointers). Therefore, if you have millions of objects that can fly, it can make a difference. In terms of speed, Java and C ++ are almost equal on this scale.

So, development time is more important to me. If you make a mistake in C ++, you get a segmentation error (and sometimes you don’t even get it), whereas in Java you get a great exception with a stack. I always preferred that.

In C ++, you can have collections of primitive types that Java doesn't have. To use them you have to use external libraries.

If you have real-time requirements, the Java garbage collector can be frustrating, since it takes just a few minutes to collect a 20-gigabyte heap, even on machines with 24 cores. But if you don't create too many temporary objects at runtime, that should be fine too. It’s just that your program can pause garbage collection whenever you do not expect it.

+4
source share

Choose the language you are most familiar with. If you know that everything is the same and speed is the real problem, select C.

+5
source share

While I am a big fan of Python and personally, I don’t really like Java, in this case I have to admit that Java is the right way.

For many projects, Python performance is simply not a problem, but in your case even minor performance penalties will add up very quickly. I know that this is not a real-time simulation, but even for batch processing this is still a factor to consider. If it turns out that the load is too large for one virtual server, an implementation that is twice as fast will halve the cost of a virtual server.

For many projects, I would also argue that Python will allow you to develop a solution faster, but here I am not sure what will happen. Java has world-class development tools and enterprise-level platforms with a top box for parallel processing and cross-server deployment, while Python has solutions in this area, Java clearly has an advantage. You also have architectural features with Java that Python cannot match, such as Javaspaces.

I would say that C and C ++ impose too much development overhead for such a project. They are viable in the sense that if you are well acquainted with these languages, I am sure that this would be feasible, but in addition to opportunities for increasing productivity, they have nothing more to bring to the table.

C # is just a rewrite of Java. This is not bad if you are a Windows developer, and if you prefer Windows, I would use C # rather than Java, but if you don't care that Windows has no reason to care about C #.

+5
source share

Record it in your preferred language. It sounds like a python to me. When you start the system, you can profile it and see where the bottlenecks are. When you do some basic optimizations, if this is still unacceptable, you can rewrite the parts in C.

The consideration may be to write this in python hardware to use clr and dlr in .net. Then you can use .net 4 and parallel extensions. If something gives you a performance boost, it will be some kind of streaming taste that works .net perfectly.

Edit:

I just wanted to make this part understandable. From the description, it sounds like this: parallel processing / multithreading is where most of the performance gain will be obtained.

+4
source share

Why only one language for your system? If I were you, I would build the entire system in Python, but C or C ++ would be used for critical components. Thus, you will have a very flexible and extensible system with a fairly high performance. You can even find tools for automatically creating wrappers (e.g. SWIG, Cython). Python and C / C ++ / Java / Fortran do not compete with each other; they complement each other.

+3
source share

It is useful to look at the inner loop of your numerical code. In the end, you will spend most of your CPU time inside this cycle.

If the inner loop is a matrix operation, then I suggest python and scipy, but for the inner loop, if not for the matrix operation, then I would be worried about the slow operation of python. (Or maybe I would wrap C ++ in python using swig or boost :: python)

The advantage of python is that it is easy to debug, and you save a lot of time without having to compile all the time. This is especially useful for a project where you spend a lot of time programming deep internal components.

0
source share

I would go with pipa. If not, http://lolcode.com/ .

-one
source share

All Articles