Quick Navigation
HADOOP#1
An open-source framework for distributed storage and processing of large datasets across clusters of computers.
HDFS#2
Hadoop Distributed File System; a scalable and fault-tolerant storage system for managing large datasets.
MAPREDUCE#3
A programming model for processing large data sets with a distributed algorithm on a cluster.
YARN#4
Yet Another Resource Negotiator; manages resources and scheduling for Hadoop applications.
DATA NODE#5
A node in HDFS that stores actual data and serves read/write requests from clients.
NAME NODE#6
The master node in HDFS that manages metadata and regulates access to files.
CLUSTER#7
A group of interconnected computers that work together to process large datasets.
SPLIT#8
A division of input data into manageable chunks for processing in MapReduce.
JOB#9
A unit of work submitted to the Hadoop cluster for processing.
PIG#10
A high-level platform for creating programs that run on Hadoop, using a language called Pig Latin.
HIVE#11
A data warehouse infrastructure built on Hadoop for querying and managing large datasets using SQL-like language.
SPARK#12
An open-source data processing engine that can run on Hadoop, known for its speed and ease of use.
DATA ANALYTICS#13
The science of analyzing raw data to uncover trends, patterns, and insights.
EXPLORATORY DATA ANALYSIS#14
An approach for analyzing datasets to summarize their main characteristics, often using visual methods.
DATA VISUALIZATION#15
The graphical representation of information and data to communicate insights clearly.
FAULT TOLERANCE#16
The ability of a system to continue operating in the event of a failure of some of its components.
REPLICATION#17
The process of duplicating data across multiple nodes in HDFS for reliability and availability.
ETL#18
Extract, Transform, Load; a process for integrating data from multiple sources into a single database.
SCALABILITY#19
The capability of a system to handle a growing amount of work or its potential to accommodate growth.
DATA LAKE#20
A centralized repository that allows you to store all your structured and unstructured data at any scale.
BIG DATA#21
Extremely large datasets that may be analyzed computationally to reveal patterns, trends, and associations.
DATA SCIENCE#22
A multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from data.
MACHINE LEARNING#23
A subset of AI that uses statistical techniques to enable computers to improve at tasks with experience.
CASE STUDY#24
An in-depth analysis of a particular instance or example within a real-world context.
API#25
Application Programming Interface; a set of protocols for building and interacting with software applications.