David is Coding

Just another techie blog

Real-Time Twitter Analysis 3: Tweet Analysis on Spark

Real-Time Analysis on Spark

We already got a Twitter Stream ingested in our cluster using Flume and Kafka, as was described in my previous post. The next step is to process and analyze tweets taken from a Kafka topic with Apache Spark Streaming. Our goal here is to make some calculations on top of the received tweets in order […]

Real-Time Twitter Analysis 2: Twitter Stream with Flume

Ingesting Twitter Stream

We already discussed the architecture for this project in my previous post here. Now, it’s time for jumping into the mood and start working on it. The first step is to ingest the Twitter Stream into our cluster. For this task, we’ll use Apache Flume and Apache Kafka, which in conjunction are also known as […]

Real-Time Twitter Analysis 1: Introduction

After setting up the Cloudera’s Quickstart VM, as described in my previous post, it’s time to show some hands-on experience about Data Engineering. For this purpose, I opted for performing a real-time sentiment analysis over this social media. The idea is to put into play different tools and skills I got during the Big Data […]

Installing Spark 2 and Kafka on Cloudera’s Quickstart VM

As you probably know, to operate with Big Data, we need a cluster of several nodes. Unfortunately, people normally don’t have access to any of them. If we want to learn how to use the technologies behind, we need to make use of VMs with a pseudo cluster assembled in it, and a set of […]