David is Coding

Just another techie blog

Real-Time Twitter Analysis 4: Displaying the Results

In the previous post, we processed a stream of tweets in real-time with Spark Streaming in order to calculate son information such as tops and counters. Now is the turn of displaying this data in an easier way to be consumed by humans. Along with this post, we’ll create a simple web-based Dashboard by using […]

Real-Time Twitter Analysis 3: Tweet Analysis on Spark

Real-Time Analysis on Spark

We already got a Twitter Stream ingested in our cluster using Flume and Kafka, as was described in my previous post. The next step is to process and analyze tweets taken from a Kafka topic with Apache Spark Streaming. Our goal here is to make some calculations on top of the received tweets in order […]

Real-Time Twitter Analysis 1: Introduction

After setting up the Cloudera’s Quickstart VM, as described in my previous post, it’s time to show some hands-on experience about Data Engineering. For this purpose, I opted for performing a real-time sentiment analysis over this social media. The idea is to put into play different tools and skills I got during the Big Data […]

Installing Spark 2 and Kafka on Cloudera’s Quickstart VM

As you probably know, to operate with Big Data, we need a cluster of several nodes. Unfortunately, people normally don’t have access to any of them. If we want to learn how to use the technologies behind, we need to make use of VMs with a pseudo cluster assembled in it, and a set of […]