You are the Architect for a new Big Data based Data Lake system for a High Volume Trading Exchange which processes approx. one million transactions per second. During peek volume it can reach upto five million transaction per second. For ease of use, let's call this system Exchange Store
Create a -
- efficient, scalable, fault tolerant and highly available system for the
Exchange Store
- data is sourced via the following options
- realtime message (~400k messages per second) via Kafka
- Start of day positions via files (~10k files with various formats like CSV, TXT, JSON and XML)
- reports are generated by End of Day (EOD) via the following options
- via Kafka topic for consumers who need to process EOD positions and trades
- EOD feed files (~12k feed files) sent via different mechanisms to consumers (SFTP, Object Store, etc.,)
Use relevant database, technology stack, frameworks and tools for creating an efficient system which processes these huge volumes without much delay. Also if possible mention why would you choose the tool over others.
Hint: Some of these which come to my mind are: Apache Spark, Apache Storm, Apache Flink, Apache Hadoop, Apache HBase, Apache Hive, Amazon EMR, Windows Azure HDInsight, GCP Dataproc, etc.,
- You need to also consider the maintainability and operational aspects of the deployment too (Observability).
- You can leverage any non-Cloud platforms or Cloud Platforms (eg, Cloud Foundry, AWS, Azure or GCP) to overlay your deployment diagram and leverage features from these platforms.
Name | Solution | Comments |
---|---|---|
Name | Solution | This architecture uses so and so.. This is a sample text. |