1. What is Big Data?

Definition: Big Data refers to massive volumes of data that cannot be processed using traditional methods.
Characteristics (5 V’s):
- Volume: Huge amount of data.
- Velocity: Speed of data generation.
- Variety: Different types of data (structured, unstructured, semi-structured).
- Veracity: Data accuracy and reliability.
- Value: Useful insights derived from data.

2. Examples of Big Data

Social media posts (Twitter, Facebook).
E-commerce transactions (Amazon, Flipkart).
IoT devices (smart home sensors).
Healthcare records.

3. Big Data Technologies

Hadoop: Open-source framework for distributed storage and processing of large datasets.
- Key components:
  - HDFS (Hadoop Distributed File System): Stores data.
  - MapReduce: Processes data.
Spark: In-memory data processing engine.
NoSQL Databases:
- Examples: MongoDB, Cassandra, HBase.
- Designed for unstructured data.

4. Big Data Tools

Storage: HDFS, Amazon S3.
Processing: Hadoop, Spark.
Analysis: Hive, Pig, Apache Flink.
Visualization: Tableau, Power BI.

5. Types of Data

Structured Data: Organized data (e.g., SQL tables).
Unstructured Data: Unorganized data (e.g., images, videos).
Semi-structured Data: Hybrid (e.g., JSON, XML).

6. Key Big Data Concepts

Distributed Computing: Data processing across multiple servers.
Data Mining: Extracting useful patterns.
Machine Learning: Predictive modeling and pattern recognition.
Data Warehousing: Central repository of integrated data.

7. Big Data Analytics

Descriptive Analytics: What happened?
Predictive Analytics: What will happen?
Prescriptive Analytics: What should we do?

8. Challenges of Big Data

Data storage and management.
Ensuring data privacy and security.
Analyzing real-time data.
Lack of skilled professionals.

9. Applications of Big Data

Healthcare: Personalized medicine, disease prediction.
Finance: Fraud detection, risk management.
Retail: Customer behavior analysis, recommendation systems.
Transport: Traffic prediction, route optimization.
Government: Smart cities, policy analysis.

10. Exam Quick Tips

Remember the 5 V’s of Big Data.
Focus on technologies like Hadoop and Spark.
Differentiate between structured, unstructured, and semi-structured data.
Know examples of Big Data applications.
Understand key analytics types: descriptive, predictive, prescriptive.

Cheat Sheet Summary

Frameworks: Hadoop (HDFS + MapReduce), Spark.
Databases: MongoDB, Cassandra.
Analysis Tools: Hive, Pig.
Key Applications: Healthcare, finance, retail, transport.

Multiple Choice Questions on Big Data

1. What are the 5 V’s of Big Data?

A) Volume, Velocity, Variety, Veracity, Value
B) Volume, Value, Visualization, Variety, Variance
C) Value, Volume, Variety, Verification, Velocity
D) Visualization, Variety, Veracity, Value, Volume

Answer: A

2. Which of the following is an open-source framework for distributed storage and processing of Big Data?

A) Spark
B) Hadoop
C) Tableau
D) SQL Server

Answer: B

3. What does HDFS stand for in the context of Big Data?

A) High Distributed File Storage
B) Hadoop Distributed File System
C) Hybrid Data File Storage
D) Hadoop Data Flow System

Answer: B

4. What type of data does NoSQL handle?

A) Structured data only
B) Unstructured and semi-structured data
C) Processed and raw data
D) Financial data exclusively

Answer: B

5. What is the main purpose of Apache Spark in Big Data?

A) Data visualization
B) In-memory data processing
C) Data storage
D) Predictive analytics

Answer: B

6. Which of the following is NOT a characteristic of Big Data?

A) Volume
B) Velocity
C) Variability
D) Variety

Answer: C

7. What is the role of MapReduce in Big Data?

A) Data visualization
B) Distributed processing of data
C) Managing databases
D) Analyzing data in real-time

Answer: B

8. Which database is commonly used in Big Data for unstructured data?

A) MySQL
B) Oracle
C) MongoDB
D) SQL Server

Answer: C

9. Which type of analytics focuses on “What should we do?”

A) Descriptive Analytics
B) Diagnostic Analytics
C) Predictive Analytics
D) Prescriptive Analytics

Answer: D

10. Which Big Data tool is used for data visualization?

A) Hive
B) Tableau
C) Pig
D) Cassandra

Answer: B

11. What is an example of semi-structured data?

A) Video files
B) SQL tables
C) JSON files
D) Text documents

Answer: C

12. Which Big Data technology is known for its distributed storage and scalability?

A) Tableau
B) Hadoop
C) Excel
D) Oracle

Answer: B

13. What challenge does Big Data face with real-time analysis?

A) Storage capacity
B) Privacy concerns
C) High latency
D) Lack of data integrity

Answer: C

14. Which application area uses Big Data for traffic prediction?

A) Retail
B) Transport
C) Healthcare
D) Government

Answer: B

15. What is distributed computing?

A) Running multiple computations on a single server
B) Processing data across multiple servers
C) Centralizing data for faster access
D) Encrypting data for secure storage

Answer: B