Kafka and its real implementation – Part1

Kafka is one of the main components of HADOOP ECOSYSTEM 

For achieving a real-time speed in data progressing, hadoop stack designers selects kafka as one of the components in a project-stack-design phase. In a layman terms you can remember kafka as a robust, less latent and distributed version of an enterprise messaging system. It publishes and subscribes  a group of stream-records.

In IT projects, when kafka

If the end-user/client wants the desired output or a report to be generated immediately just after data got  generated from source. In addition, also if the client wants to have all the required processing over the data is to be done with almost zero wait time and final out put gets generated, in such cases kafka is the best option.

If the input is GBs of  market data and client already has a competitor in his business. In this case client just wants their company to be ahead of their competitor  to capture market. To achieve this task, and give the client the fruit of direct-business-growth , you should make the processed information to reach your client much faster then others ( some time 100X times faster) , so that client can take decisions much ahead of their competitor.Now a days as every client is much interested to speed up their business, so it can be said that in almost in all the business-scenarios you can accommodate kafka and stack-design


Capacity Planning Kafka

Kafka always runs ona separate cluster on different servers. kafka not only process the standby data or any table data, but also progress the continuous stream of data on fly. IN standalone mode at least 8 GB of RAM, (good to have 30 GB of storage) is required for Kafka server to start along with zookeeper.


Kafka stores and replicates the data so as to ensure the data consistency  is maintained along with read-point-location even in case of any unexpected hiccups server performance(Kafka is fail safe). To obtain a best page caching Linux is best OS to opt for kafka. kafka 0.8 and above is also having topic partitioning which further speeds up process.Multiple zookeeper servers and also multiple brokers should be installed over a single network. Install Kafka at multiple levels of software life cycles to have best producer and consumer mechanism.

Kafka not only broadcast and store data but also it has a feature to process data on fly, but its limited to simple processing.Kafka can also handle complex processing of the data which is in coordination with flume and message ordering is guaranteed in kafka.

–we will discuss about Kafka installation and real time streaming example

–also in upcoming sessions we will discuss about beauty and power of kafka-flume integration

                                                                    keep an eye on this page


1 Comment

Please leave a comment as it really matters. Your comments are our energy boosters

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s