Uncategorized – Page 4 – pandu’sblog

Hadoop – Command Reference

There are many more commands in “$HADOOP_HOME/bin/hadoop fs” than are demonstrated here, although these basic operations will get you started. Running ./bin/hadoop dfs with no additional arguments will list all the commands that can be run with the FsShell system. Furthermore, $HADOOP_HOME/bin/hadoop fs -help commandName will display a short usage summary for the operation in question, if you areContinue reading “Hadoop – Command Reference”

Explain Hive concept and Data storage in Hadoop

This article will help you explain what Hive partitioning is, what partitioning requires it to be, how itimproves performance. Partitioning is the technique of optimization at Hive which significantly improves efficiency. Apache Hive is the top-of-Hadoop data warehouse that enables ad-hoc analysis over structured and semi-structured data. Let's go into depth about partitioning Apache Hive.Continue reading “Explain Hive concept and Data storage in Hadoop”

Explain Hive concept and Data storage in Hadoop

MapReduce and Sqoop in hadoop

Welcome to the lesson ‘MapReduce and Sqoop’ of Big Data Hadoop tutorial which is a part of ‘big data training’ offered by OnlineItGuru. This lesson will focus on MapReduce and Sqoop in the Hadoop Ecosystem. Let’s look at the objectives of this lesson in the next section. Objectives After completing this lesson, you will beContinue reading “MapReduce and Sqoop in hadoop”

Advanced Hive Concepts and Data File Partitioning

Welcome to the lesson ‘Advanced Hive Concept and Data File Partitioning’ which is a part of ‘big data hadoop online training’ offered by OnlineItGuru. Let us look at the objectives first. Objectives After completing this lesson, you will be able to: Improve query performance with the concepts of data file partitioning in hive Define HIVEContinue reading “Advanced Hive Concepts and Data File Partitioning”

Spark Parallel Processing Tutorial

Objectives After completing this lesson, you will be able to: Explain how RDDs are distributed across a spark cluster Analyze how Spark partitions file-based RDDs Explain how Spark executes RDD operations in parallel Explain how to control parallelization through partitioning Analyze how to view and monitor tasks and stages. Check out the big data hadoopContinue reading “Spark Parallel Processing Tutorial”

Apache Flume and Hbase in Hadoop

Welcome to lesson ‘Apache Flume and HBase’ of Big Data Hadoop tutorial which is a part of ‘big data training’ offered by OnlineItGuru. This lesson will focus on Apache Flume and HBase in the Hadoop ecosystem. Let us look at the objectives of this lesson in the next section. Objectives After completing this lesson, youContinue reading “Apache Flume and Hbase in Hadoop”

Hadoop – HDFS overview and operaions

Hadoop File System was developed using distributed file system design. It is run on commodity hardware. Unlike other distributed systems, HDFS is highly faulttolerant and designed using low-cost hardware. HDFS holds very large amount of data and provides easier access. To store such huge data, the files are stored across multiple machines. These files areContinue reading “Hadoop – HDFS overview and operaions”

Types of Data Formats

Welcome to the lesson ‘Types of Data Formats’ which is a part of “big data and hadoop course” offered by OnlineItGuru. In this lesson, we will discuss the different types of data formats. Objectives After completing this lesson, you will be able to: Describe the different types of Hadoop file formats Explain data serialization inContinue reading “Types of Data Formats”

What is Hadoop?Modules of hadoop?

What is Hadoop Hadoop is an open source framework from Apache and is used to store process and analyze data which are very huge in volume. Hadoop is written in Java and is not OLAP (online analytical processing). It is used for batch/offline processing.It is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and manyContinue reading “What is Hadoop?Modules of hadoop?”