Kafka is one of the main components of HADOOP ECOSYSTEM
For achieving a real-time speed in data progressing, hadoop stack designers selects kafka as one of the components in a project-stack-design phase. In a layman terms you can remember kafka as a robust, less latent and distributed version of an enterprise messaging system. It publishes and subscribes a group of stream-records.
In IT projects, when kafka
If the end-user/client wants the desired output or a report to be generated immediately just after data got generated from source. In addition, also if the client wants to have all the required processing over the data is to be done with almost zero wait time and final out put gets generated, in such cases kafka is the best option.
If the input is GBs of market data and client already has a competitor in his business. In this case client just wants their company to be ahead of their competitor to capture market. To achieve this task, and give the client the fruit of direct-business-growth , you should make the processed information to reach your client much faster then others ( some time 100X times faster) , so that client can take decisions much ahead of their competitor.Now a days as every client is much interested to speed up their business, so it can be said that in almost in all the business-scenarios you can accommodate kafka and stack-design
Capacity Planning Kafka
Kafka always runs ona separate cluster on different servers. kafka not only process the standby data or any table data, but also progress the continuous stream of data on fly. IN standalone mode at least 8 GB of RAM, (good to have 30 GB of storage) is required for Kafka server to start along with zookeeper.
Kafka stores and replicates the data so as to ensure the data consistency is maintained along with read-point-location even in case of any unexpected hiccups server performance(Kafka is fail safe). To obtain a best page caching Linux is best OS to opt for kafka. kafka 0.8 and above is also having topic partitioning which further speeds up process.Multiple zookeeper servers and also multiple brokers should be installed over a single network. Install Kafka at multiple levels of software life cycles to have best producer and consumer mechanism.
Kafka not only broadcast and store data but also it has a feature to process data on fly, but its limited to simple processing.Kafka can also handle complex processing of the data which is in coordination with flume and message ordering is guaranteed in kafka.
–we will discuss about Kafka installation and real time streaming example
–also in upcoming sessions we will discuss about beauty and power of kafka-flume integration
keep an eye on this page