How to learn about the development of high-cost systems?

I mainly worked on data analysis, BI tools, etc. Most of the applications I'm working on are core read-only applications. Although I also worked on simple CRUD applications, but nothing unusually transactional. As a software engineer, I feel that there is a void in my training if I do not know how to develop high-cost systems and databases, for example, how Amazon, airline systems, etc. work. I would like to ask the community here to offer some resources, books or simple projects on this subject. Something that can take a practical approach, while teaching about the necessary theory. I know this is a subjective question, but I can mark the most useful answer is green. We look forward to your suggestions and thank you in anticipation.

+8
database-design distributed transactions distributed-transactions high-availability
source share
1 answer

I am going to organize the answer in four broad categories, namely

  • theoretical and academic background,
  • popular sources
  • software and tools, and
  • exercises.

Books and articles

This is the basis of the field - how to go from 0 to a pretty decent, skillful level, but basically theoretically.

Entry level

Transaction Processing: Concepts and Techniques (Morgan Kaufmann Series in Data Management Systems) Jim Gray

The Silberschatz book ( Database System Concepts ) in the following chapters covers the internal workings of advanced transaction systems, has some resources, etc.

Database specification

H-store paper - describes the benefits of internal memory for large transactional loads. The work of the H-store inspired the development of VoltDB.

Calvin Paper - Fast, distributed transactions for database systems. It gives a very good background, related work and understanding of the current level of technology.

The architecture of the Hellerstein, Stonebraker, and Hamilton database systems covers many aspects.

Limitations and Borders

Great article on the merits and limitations of high-availability transactions.

CAP-theoretical article - On compromise solutions for consistency, availability, and breakdown for large-scale systems. Very important.

Parallel Processing and Parallel Databases

Popular and current sources

Blogs

High scalability is the perfect blog for what you are looking for. For example, here's a great entry on the evolution of Amazon architecture . Very close to what you were looking for.

Facebook , LinkedIn, and Twitter engineering blogs are great resources. I would also check out Google Research and their Google+ site . Netflix is โ€‹โ€‹also good.

The conference

VLDB and SIGMOD conferences (including the SIGMOD blog ), where most of the most advanced data systems are presented by researchers / academia and corporations.

HPTS is an interesting conference / seminar with a good agenda and publications.

I would even check the USENIX series for advanced, system things.

Case Study Architecture

VoltDB is a super-transactional in-memory database developed by Mike Stonebreaker, ACM, and the โ€œfatherโ€ of state-of-the-art database concepts.

IBM mainframe still holds a very important place in the world of high-volume transaction processing transactions. While writing this answer, they are touting their Z13 system for extreme, encrypted transaction processing volumes .

If you're interested in Big Data transactions, there are many options, but HBase is probably the most interesting. Here are some recommended reading sources for HBase: Yahoo Omid , HBase-Based HBase Transactions

Another interesting architecture is Twitter, now Apache Storm . and Apache Kafka for real-time streaming and processing.

Benchmarks and Exercises

If you want to try a few things, check out the TPC family of tests. There are analytic tests of transactional, ETL, BI and support / mixed load solutions. It is relationally oriented.

You can take these tests and apply them to open source SMP (e.g. postgres, MySQL) and MPP databases such as Greenplum (link to a large and complete documentation on queries, performance, some sample settings and how to process requests in files MPP ).

I recommended these practical scenarios and architectures for HBase-oriented transaction systems.

For modern communications and action-oriented transaction systems, you may have to buy a book or two. For Akka (which serves as internal to Spark), you can probably use Akka in action and complete the exercises at the end of each chapter. There are also exercises from training sessions here .

For stream processing, here are some good exercises with Apache Kafka ( parts 1 and http://www.confluent.io/blog/stream-data-platform-2/ ). Cloudera has a good start.

To practice modern message-oriented systems, I would suggest โ€œGetting Started with the Stormโ€ and possibly go through these exercises . There are a number of real topologies.

For a good old JMS, you can use this online link to practice or more complex with these active MQ exercises .

If you want to torture yourself with the mainframe, try this emulator . It emulates IBM OS / 370-390.

+8
source share

All Articles