Articles & News
Category:

09 October 2014

Bay Area bike share analysis with the Hadoop Notebook and Spark & SQL

This post was initially published on the Hue project blog https://gethue.com/bay-area-bike-share-data-analysis-with-spark-notebook-part-2/ Apache Spark is getting popular and Hue contributors are working on making it accessible to even more users. Specifically, by creating a Web interface that allows anyone with a browser to type some Spark code and execute it. A Spark submission REST API was built for this purpose and can also be leveraged by the developers. In a previous post, we demonstrated how to use Hue's Search app to seamlessly index and visualize trip data from Bay Area Bike Share and leverage Spark to supplement that analysis by adding weather data to our dashboard.…

6 minutes read - Tutorial

08 November 2013

Season II: 8. How to transfer data from Hadoop with Sqoop 2

Note: Sqoop2 is now replaced by https://gethue.com/importing-data-from-traditional-databases-into-hdfshive-in-just-a-few-clicks/  Apache Sqoop is a great tool for moving data (in files or databases) in or out of Hadoop. In Hue 3, a new app was added for making Sqoop2 easier to use. In this final episode (previous one was about Search) of the season 2 of the Hadoop Tutorial series let’s see how simple it becomes to export our Yelp results into a MySql table!…

2 minutes read - Tutorial

04 November 2013

Season II: 7. How to index and search Yelp data with Solr

In the previous episode we saw how to use Pig and Hive with HBase. This time, let’s see how to make our Yelp data searchable by indexing it and building a customizable UI with the Hue Search app.    Indexing data into Solr  This tutorial is based on SolrCloud. Here is a step by step guide about its installation and a list of required packages: solr-server solr-mapreduce search  Next step is about deploying and configuring Solr Cloud.…

3 minutes read - Tutorial

21 October 2013

Season II: 6. Use Pig and Hive with HBase

The HBase app is an elegant way to visualize and search a lot of data. Apache HBase tables can be tricky to update as they require lower level API. Some good alternative for simplifying the data management or access is to use Apache Pig or Hive.  In this post we are going to show how to load our yelp data from the Oozie Bundles episode into HBase with Hive. Then we will use the HBase Browser to visualize it and Pig to compute some statistics.…

3 minutes read - Tutorial

14 October 2013

Season II: 5. Bundle Oozie coordinators with Hue

Hue provides a great Oozie UI in order to use Oozie without typing any XML. In Tutorial 3, we demonstrate how to use an Oozie coordinator for scheduling a daily top 10 of restaurants. Now lets imagine that we also want to compute a top 10 and 100. How can we do this? One solution is to use Oozie bundles.    Workflow and Coordinator updates Bundles are are way to group coordinators together into a set.…

3 minutes read - Tutorial

27 September 2013

Season II: 4. Fast SQL with the Impala Query Editor

In the previous episodes, we presented how to schedule repetitive worflows on the grid with Oozie Coordinator. Let’s now look at a fast way to query some data with Impala. Hue, the Hadoop UI, has been supporting Impala closely since its first version and brings fast interactive queries within your browser. If not familiar with Impala, we recommend you to check the documentation of the fastest SQL engine for Hadoop. Impala App Most of Hive SQL is compatible with Impala and we are going to compare the queries of episode one in both Hive and Impala applications.…

3 minutes read - Tutorial

18 September 2013

Season II: 3. Schedule Hive queries with Oozie coordinators

In the previous episode we saw how to create an Hive action in an Oozie workflow. These workflows can then be repeated automatically with an Oozie coordinator. This post describes how to schedule Hadoop jobs (e.g. run this job everyday at midnight). Oozie Coordinators Our goal: compute the 10 coolest restaurants of the day everyday for 1 month:  From episode 2, now have a workflow ready to be ran everyday.…

2 minutes read - Tutorial

11 September 2013

Season II: 2. Execute Hive queries and schedule them with Oozie

In the previous episode, we saw how to to transfer some file data into Apache Hadoop. In order to interrogate easily the data, the next step is to create some Hive tables. This will enable quick interaction with high level languages like SQL and Pig.   We experiment with the SQL queries, then parameterize them and insert them into a workflow in order to run them together in parallel.…

5 minutes read - Tutorial

05 September 2013

Season II: 1. Prepare the data for analysis with Pig and Python UDF

Welcome to season 2 of the Hue video series. In this new chapter we are going to demonstrate how Hue can simplify Hadoop usage and lets you focus on the business and less about the underlying technology. In a real life scenario, we will use various Hadoop tools within the Hue UI and explore some data and extract some competitive advantage insights from it.   Let’s go surf the Big Data wave, directly from your Browser!…

3 minutes read - Tutorial

More recent stories

26 June 2024
Integrating Trino Editor in Hue: Supporting Data Mesh and SQL Federation
Read More
03 May 2023
Discover the power of Apache Ozone using the Hue File Browser
Read More