Hadoop Practical 2026 | BCA/BBA Sem 6 Important Questions 🔥


SET 1


1. Create a directory named /student _data in HDFS.

2. Create a subdirectory /student_data/sem6.

3. Display the directory structure


1. Create a directory /student_data in HDFS

hdfs dfs -mkdir /student_data


2. Create a subdirectory /student_data/sem6

hdfs dfs -mkdir /student_data/sem6


3. Display the directory structure

Hdfs dfs -ls

SET 2

 

1. Copy a file students.txt from /student_data to /backup directory in HDFS.

2. Verify the copied file.

3. Display the content of the copied file.

 


1. Copy students.txt from /student_data to /backup

hdfs dfs -cp /student_data/students.txt /backup/


2. Verify the copied file

hdfs dfs -ls /backup

👉 This will show if students.txt is present in /backup.


3. Display the content of the copied file

hdfs dfs -cat /backup/students.txt


💡 Alternative (for large file viewing)

hdfs dfs -tail /backup/students.txt


 

SET 3

Create directory /bigdata/lab.

2. Change permission of the directory to read, write, execute for owner only.

3. Display directory permissions.


1. Create directory /bigdata/lab

hdfs dfs -mkdir -p /bigdata/lab


2. Change permission (rwx for owner only → 700)

hdfs dfs -chmod 700 /bigdata/lab


3. Display directory permissions

hdfs dfs -ls /bigdata


💡 Sample Output (Example)

drwx------   - user supergroup          0 2026-03-26  /bigdata/lab

👉 drwx------ means:

Owner: read, write, execute

Group & Others: no permission


 

SET 4

Upload a file students.txt from the local system to /student_data in HDFS.

2. Verify that the file is uploaded successfully.

3. Display the content of the uploaded file.

Here are the HDFS DFS commands in clear English:


1. Upload students.txt from local system to /student_data

hdfs dfs -put students.txt /student_data/

👉 Alternative command:

hdfs dfs -copyFromLocal students.txt /student_data/


2. Verify the file is uploaded successfully

hdfs dfs -ls /student_data

👉 Check if students.txt appears in the list.


3. Display the content of the uploaded file

hdfs dfs -cat /student_data/students.txt


💡 Optional (for large files)

hdfs dfs -head /student_data/students.txt


 

SET 5

1. Create a directory /test_directory in HDFS.

2. Delete the directory using the HDFS command.

3. Verify that the directory is deleted.

 


1. Create directory /test_directory

hdfs dfs -mkdir /test_directory


2. Delete the directory

hdfs dfs -rm -r /test_directory

👉 -r is used to remove directories.


3. Verify the directory is deleted

hdfs dfs -ls /

👉 Check that /test_directory is not present in the list.


💡 Alternative (safe delete with skip trash)

hdfs dfs -rm -r -skipTrash /test_directory


SET 6

 

1. Create a text file data.txt containing sample text data.

2. Create an input directory /input_wc in HDFS.

3. Upload the file data.txt from the local file system to /input_wc in HDFS.

4. Execute the WordCount MapReduce program using Hadoop.

5. Store the output in /output_wc directory in HDFS.

6. Display the output result showing each word and its frequency.

Got it 👍 Now I’ll teach you step-by-step in Windows (very easy, exam + practical ready).

 


🧑‍🏫 HADOOP WORDCOUNT (WINDOWS STEP BY STEP)


STEP 1: Create a Text File (Windows)

👉 Method 1 (Easiest)

Right-click on Desktop

Click New → Text Document

Rename it → data.txt

👉 Open the file and type:

hello world hello hadoop big data hadoop

👉 Press CTRL + S to save


👉 Method 2 (Using Command Prompt)

Open Command Prompt and type:

echo hello world hello hadoop big data hadoop > data.txt

👉 Check file:

type data.txt


STEP 2: Start Hadoop (Windows)

👉 Open Command Prompt and run:

start-dfs.cmd

start-yarn.cmd

👉 Verify:

jps

You should see:

NameNode

DataNode

ResourceManager

NodeManager


STEP 3: Create HDFS Input Directory

hdfs dfs -mkdir /input_wc

👉 Check:

hdfs dfs -ls /


STEP 4: Upload File to HDFS

hdfs dfs -put data.txt /input_wc/

👉 Verify:

hdfs dfs -ls /input_wc


STEP 5: Run WordCount Program

hadoop jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-*.jar wordcount /input_wc /output_wc


⚠️ If Error: "File already exists"

hdfs dfs -rm -r /output_wc


STEP 6: Check Output Folder

hdfs dfs -ls /output_wc


STEP 7: Display Output Result

hdfs dfs -cat /output_wc/part-r-00000


🎯 FINAL OUTPUT (Example)

big 1

data 1

hadoop 2

hello 2

world 1


 

SET 7

Create a dataset file sales.txt containing records of product sales.

2. Upload the dataset file into HDFS input directory.

3. Execute a MapReduce program to calculate the total sales of each product.

4. Store the output result in the HDFS output directory.

5. Display the result showing product name and total sales.


🧑‍🏫 HADOOP TOTAL SALES (WINDOWS ONLY)


STEP 1: Create Dataset File sales.txt (Windows)

👉 Using Notepad:

Right-click → New → Text Document

Rename → sales.txt

Open and type:

laptop 50000

mobile 20000

laptop 30000

tablet 15000

mobile 10000

Press CTRL + S to save


STEP 2: Start Hadoop (Windows)

start-dfs.cmd

start-yarn.cmd

👉 Verify:

jps


STEP 3: Create HDFS Input Directory

hdfs dfs -mkdir /input_sales


STEP 4: Upload File to HDFS

hdfs dfs -put sales.txt /input_sales/

👉 Verify:

hdfs dfs -ls /input_sales


STEP 5: Run MapReduce Program

hadoop jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-*.jar wordcount /input_sales /output_sales


⚠️ If error comes:

hdfs dfs -rm -r /output_sales


STEP 6: Check Output

hdfs dfs -ls /output_sales


STEP 7: Display Result

hdfs dfs -cat /output_sales/part-r-00000


🎯 SAMPLE OUTPUT

laptop 2

mobile 2

tablet 1


 

SET 8

Q1. Create a dataset student_marks.txt containing fields (RollNo, Name, Marks).

Perform the following operations using Apache Pig: [15]

1. Load the dataset into Pig.

2. Display all student records.

3. Filter students who scored more than 70 marks.

4. Store the filtered result in an HDFS output directory.

 


🧑‍🏫 APACHE PIG PRACTICAL (STUDENT MARKS)


STEP 1: Create Dataset File student_marks.txt (Windows)

👉 Using Notepad:

Right-click → New → Text Document

Rename → student_marks.txt

Open and write:

101,John,75

102,Amit,65

103,Riya,85

104,Neha,55

105,Rahul,90

👉 Save (CTRL + S)


STEP 2: Upload File to HDFS

hdfs dfs -mkdir /pig_input

hdfs dfs -put student_marks.txt /pig_input/

👉 Verify:

hdfs dfs -ls /pig_input


STEP 3: Start Apache Pig (Windows)

pig

👉 Pig Grunt shell will open:            grunt>


STEP 4: Load Dataset into Pig

students = LOAD '/pig_input/student_marks.txt'

USING PigStorage(',')

AS (rollno:int, name:chararray, marks:int);


STEP 5: Display All Records

DUMP students;

👉 Output:

(101,John,75)

(102,Amit,65)

...


STEP 6: Filter Students (Marks > 70)

filtered_students = FILTER students BY marks > 70;


STEP 7: Display Filtered Data

DUMP filtered_students;

👉 Output:

(101,John,75)

(103,Riya,85)

(105,Rahul,90)


STEP 8: Store Result in HDFS

STORE filtered_students INTO '/pig_output' USING PigStorage(',');


STEP 9: Verify Output

Exit Pig:

quit

Check in HDFS:

hdfs dfs -ls /pig_output

Display result:

hdfs dfs -cat /pig_output/part-r-00000


🎯 FINAL OUTPUT (Stored)

101,John,75

103,Riya,85

105,Rahul,90


 

SET 9

Q1. Create a dataset employee.txt containing (EmpID, Name, Department, Salary).

Perform the following operations using Apache pig. [15]

1. Load the dataset into Pig.

2. Filter employees from IT department.

3. Display Name and Salary of filtered employees.

4. Store the result in HDFS.


🧑‍🏫 APACHE PIG PRACTICAL (EMPLOYEE DATA)


STEP 1: Create Dataset File employee.txt (Windows)

👉 Using Notepad:

Right-click → New → Text Document

Rename → employee.txt

Open and write:

101,John,IT,50000

102,Amit,HR,40000

103,Riya,IT,60000

104,Neha,Finance,45000

105,Rahul,IT,70000

👉 Save (CTRL + S)


👉 OR using Command Prompt:

echo 101,John,IT,50000 > employee.txt

echo 102,Amit,HR,40000 >> employee.txt

echo 103,Riya,IT,60000 >> employee.txt

echo 104,Neha,Finance,45000 >> employee.txt

echo 105,Rahul,IT,70000 >> employee.txt


STEP 2: Upload File to HDFS

hdfs dfs -mkdir /pig_input

hdfs dfs -put employee.txt /pig_input/

👉 Verify:

hdfs dfs -ls /pig_input


STEP 3: Start Apache Pig

pig

👉 You will see:

grunt>


STEP 4: Load Dataset into Pig

emp = LOAD '/pig_input/employee.txt'

USING PigStorage(',')

AS (id:int, name:chararray, dept:chararray, salary:int);


STEP 5: Filter Employees from IT Department

it_emp = FILTER emp BY dept == 'IT';


STEP 6: Display Name and Salary

result = FOREACH it_emp GENERATE name, salary;

👉 Display:

DUMP result;

👉 Output:

(John,50000)

(Riya,60000)

(Rahul,70000)


STEP 7: Store Result in HDFS

STORE result INTO '/pig_output_emp' USING PigStorage(',');


STEP 8: Verify Output

Exit Pig:

quit

Check output:

hdfs dfs -ls /pig_output_emp

Display:

hdfs dfs -cat /pig_output_emp/part-r-00000


🎯 FINAL OUTPUT

John,50000

Riya,60000

Rahul,70000


 

SET 10

Create a dataset movie_rating.txt containing (MovieName, User, Rating).

Perform the following operations using Apache Pig : [15]

1. Load the dataset into Pig.

2. Group the data by MovieName.

3. Calculate average rating for each movie.


STEP 1: Create Dataset File movie_rating.txt (Windows)

👉 Using Notepad:

Right-click → New → Text Document

Rename → movie_rating.txt

Open and write:

Avengers,User1,4

Avengers,User2,5

Titanic,User3,5

Titanic,User4,4

Avatar,User5,3

Avatar,User6,4

👉 Save (CTRL + S)


STEP 2: Upload File to HDFS

hdfs dfs -mkdir /pig_input

hdfs dfs -put movie_rating.txt /pig_input/

👉 Verify:

hdfs dfs -ls /pig_input


STEP 3: Start Apache Pig

pig

👉 You will see:

grunt>


STEP 4: Load Dataset into Pig

movies = LOAD '/pig_input/movie_rating.txt'

USING PigStorage(',')

AS (moviename:chararray, user:chararray, rating:int);


STEP 5: Group Data by MovieName

grp_movies = GROUP movies BY moviename;


STEP 6: Calculate Average Rating

avg_rating = FOREACH grp_movies GENERATE group AS movie, AVG(movies.rating) AS avg_rating;


STEP 7: Display Result

DUMP avg_rating;


🎯 FINAL OUTPUT (Example)

(Avengers,4.5)

(Titanic,4.5)

(Avatar,3.5)


SET 11

Q1. Perform the following tasks using Apache Pig and User Defined Function (UDF): [15]

1. Load the dataset into Pig using PigStorage.

2. Display all employee records.

3. Filter employees belonging to the IT department.

4. Create and apply a User Defined Function (UDF) to calculate 10% bonus on salary.

5. Display the employee name, salary, and calculated bonus.

 


🧑‍🏫 APACHE PIG + UDF (EMPLOYEE BONUS)


STEP 1: Create Dataset employee.txt (Windows)

👉 Using Notepad:

101,John,IT,50000

102,Amit,HR,40000

103,Riya,IT,60000

104,Neha,Finance,45000

105,Rahul,IT,70000

Save the file.


STEP 2: Upload File to HDFS

hdfs dfs -mkdir /pig_input

hdfs dfs -put employee.txt /pig_input/


STEP 3: Start Apache Pig

pig

👉 Pig shell:

grunt>


STEP 4: Load Dataset using PigStorage

emp = LOAD '/pig_input/employee.txt'

USING PigStorage(',')

AS (id:int, name:chararray, dept:chararray, salary:int);


STEP 5: Display All Records

DUMP emp;


STEP 6: Filter IT Department Employees

it_emp = FILTER emp BY dept == 'IT';


🧠 STEP 7: Create UDF (10% Bonus)

👉 Create Java file BonusUDF.java

import org.apache.pig.EvalFunc;

import java.io.IOException;

 

public class BonusUDF extends EvalFunc<Double> {

    public Double exec(org.apache.pig.data.Tuple input) throws IOException {

        if (input == null || input.size() == 0)

            return null;

        try {

            Double salary = (Double) input.get(0);

            return salary * 0.10;   // 10% bonus

        } catch(Exception e) {

            return null;

        }

    }

}


👉 Compile and Create JAR (Windows CMD)

javac -cp %PIG_HOME%\lib\pig.jar BonusUDF.java

jar -cvf bonus.jar BonusUDF.class


STEP 8: Register UDF in Pig

REGISTER 'bonus.jar';

DEFINE bonus BonusUDF();


STEP 9: Apply UDF

result = FOREACH it_emp GENERATE name, salary, bonus(salary);


STEP 10: Display Result

DUMP result;


🎯 FINAL OUTPUT

(John,50000,5000.0)

(Riya,60000,6000.0)

(Rahul,70000,7000.0)


 

SET 12

Here is a simple R program (exam-ready) for the FizzBuzz problem:


🧑‍🏫 R Program: FizzBuzz (1 to 100)

for (i in 1:100) {

 

  if (i %% 3 == 0 && i %% 5 == 0) {

    print("FizzBuzz")

   

  } else if (i %% 3 == 0) {

    print("Fizz")

   

  } else if (i %% 5 == 0) {

    print("Buzz")

   

  } else {

    print(i)

   

  }

}


💡 Explanation (Simple)

%% → Modulus operator (gives remainder)

i %% 3 == 0 → divisible by 3

i %% 5 == 0 → divisible by 5

Both true → print "FizzBuzz"


🎯 Sample Output

1

2

"Fizz"

4

"Buzz"

"Fizz"

...

"FizzBuzz"



SET 13

 

Q1. Write a R program to create a vector of a specified type and length. Create vector

of numeric, complex, logical and character types of length 6.


🧑‍🏫 R Program: Create Vectors of Different Types

# Numeric vector (length 6)

numeric_vec <- numeric(6)

 

# Complex vector (length 6)

complex_vec <- complex(6)

 

# Logical vector (length 6)

logical_vec <- logical(6)

 

# Character vector (length 6)

char_vec <- character(6)

 

# Display all vectors

print(numeric_vec)

print(complex_vec)

print(logical_vec)

print(char_vec)


💡 Explanation (Simple)

·        numeric(6) → creates numeric vector of length 6 (default = 0)

·        complex(6) → creates complex vector (default = 0+0i)

·        logical(6) → creates logical vector (default = FALSE)

·        character(6) → creates character vector (default = empty "")


🎯 Output

[1] 0 0 0 0 0 0

[1] 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i

[1] FALSE FALSE FALSE FALSE FALSE FALSE

[1] "" "" "" "" "" ""


 

SET 14

 

Write a R program to create a list containing strings, numbers, vectors and

a logical values


🧑‍🏫 R Program: Create a List with Mixed Data Types

# Create a list

my_list <- list(

  name = "Tanishq",                # string

  age = 21,                       # number

  marks = c(85, 90, 78),          # vector

  passed = TRUE                   # logical value

)

 

# Display the list

print(my_list)


💡 Explanation (Simple)

·        list() → used to store different types of data

·        "Tanishq" → string

·        21 → numeric value

·        c(85, 90, 78) → vector

·        TRUE → logical value


🎯 Output

$name

[1] "Tanishq"

$age

[1] 21

 $marks

[1] 85 90 78

 $passed

[1] TRUE


 

SET 15

Q1. Write a R program to sort a Vector in ascending and descending order.


🧑‍🏫 R Program: Sort a Vector

# Create a vector

vec <- c(45, 12, 78, 23, 56, 9)

 

# Sort in ascending order

asc <- sort(vec)

 

# Sort in descending order

desc <- sort(vec, decreasing = TRUE)

 

# Display results

print("Ascending Order:")

print(asc)

 

print("Descending Order:")

print(desc)


💡 Explanation (Simple)

·        sort(vec) → sorts vector in ascending order

·        sort(vec, decreasing = TRUE) → sorts in descending order


🎯 Output

Ascending Order:

[1]  9 12 23 45 56 78

 

Descending Order:

[1] 78 56 45 23 12  9



SET 16

Write an R program to find Sum, Mean and Product of a Vector.


🧑‍🏫 R Program: Sum, Mean, Product

# Create a vector

vec <- c(2, 4, 6, 8, 10)

 

# Calculate Sum

sum_val <- sum(vec)

 

# Calculate Mean

mean_val <- mean(vec)

 

# Calculate Product

prod_val <- prod(vec)

 

# Display results

print(paste("Sum =", sum_val))

print(paste("Mean =", mean_val))

print(paste("Product =", prod_val))


💡 Explanation (Simple)

·        sum(vec) → adds all elements

·        mean(vec) → average of elements

·        prod(vec) → multiplies all elements


🎯 Output

Sum = 30

Mean = 6

Product = 3840


 

SET 17

Q1. Write a R program to create a list named s containing sequence of 15 capital

letters, starting from ‘E’


🧑‍🏫 R Program: List of 15 Capital Letters starting from ‘E’

# Create sequence of letters from E

letters_seq <- LETTERS[5:(5+14)]   # E is 5th letter

 

# Create list named s

s <- list(letters_seq)

 

# Display the list

print(s)


💡 Explanation (Simple)

·        LETTERS → built-in vector of A to Z

·        LETTERS[5] → E

·        5:(5+14) → selects 15 letters from E

·        list() → creates a list


🎯 Output

[[1]]

[1] "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"


 

SET 18

Write a R program to extract all elements except the third element of the first

vector of a given list


🧑‍🏫 R Program: Extract All Elements Except 3rd (from First Vector in List)

# Create a list with vectors

my_list <- list(

  c(10, 20, 30, 40, 50),

  c(5, 15, 25)

)

 

# Extract first vector and remove 3rd element

result <- my_list[[1]][-3]

 

# Display result

print(result)


 

SET 19

Q1 Create a dataset sales_data.txt containing the following fields: [15]

ProductID, ProductName, Quantity, Price

Example dataset:

101,Laptop,5,50000

102,Mobile,10,20000

103,Tablet,7,15000

104,Laptop,3,50000

105,Mobile,6,20000

Perform the following tasks using Apache Hive:

1. Create a Hive database named sales_db. (2 Marks)

2. Create a Hive table named sales with appropriate fields. (3 Marks)

3. Load the dataset sales_data.txt into the Hive table. (3 Marks)

4. Display all records from the table. (2 Marks)

5. Write a Hive query to calculate total quantity sold for each product using GROUP BY.

(3 Marks)

6. Display the result showing ProductName and Total Quantity. (2 Marks)

 


🧑‍🏫 APACHE HIVE PRACTICAL (SALES DATA)


STEP 1: Create Dataset sales_data.txt (Windows)

👉 Using Notepad:

101,Laptop,5,50000

102,Mobile,10,20000

103,Tablet,7,15000

104,Laptop,3,50000

105,Mobile,6,20000

👉 Save the file


STEP 2: Upload File to HDFS

hdfs dfs -mkdir /hive_input

hdfs dfs -put sales_data.txt /hive_input/


STEP 3: Start Hive

hive

👉 Hive shell will open:

hive>


STEP 4: Create Database (2 Marks)

CREATE DATABASE sales_db;

👉 Use database:

USE sales_db;


STEP 5: Create Table (3 Marks)

CREATE TABLE sales (

  product_id INT,

  product_name STRING,

  quantity INT,

  price INT

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ',';


STEP 6: Load Data into Table (3 Marks)

LOAD DATA INPATH '/hive_input/sales_data.txt'

INTO TABLE sales;


STEP 7: Display All Records (2 Marks)

SELECT * FROM sales;


STEP 8: Calculate Total Quantity (GROUP BY) (3 Marks)

SELECT product_name, SUM(quantity) AS total_quantity

FROM sales

GROUP BY product_name;


STEP 9: Display Result (2 Marks)

👉 Output:

Laptop 8

Mobile 16

Tablet 7


SET 20

Q1. Write a R program to create an array of three 3x2 matrices each with 3 rows

and 2 columns from two given two vectors of different length.

Here is a simple R program (exam-ready) to create an array of three 3×2 matrices using two vectors of different lengths:


🧑‍🏫 R Program: Array of 3 Matrices (3×2)

# Create two vectors of different lengths

v1 <- c(1, 2, 3, 4, 5, 6)

v2 <- c(7, 8, 9, 10)

 

# Combine both vectors

data <- c(v1, v2)

 

# Create array (3 rows, 2 columns, 3 matrices)

arr <- array(data, dim = c(3, 2, 3))

 

# Display array

print(arr)


💡 Explanation (Simple)

·        v1 and v2 → two vectors of different lengths

·        c(v1, v2) → combines them into one vector

·        array() → creates multi-dimensional structure

·        dim = c(3,2,3) →

o   3 rows

o   2 columns

o   3 matrices


🎯 Output (Example)

, , 1

 

     [,1] [,2]

[1,]    1    4

[2,]    2    5

[3,]    3    6

 

, , 2

 

     [,1] [,2]

[1,]    7   10

[2,]    8    1

[3,]    9    2

 

, , 3

 

     [,1] [,2]

[1,]    3    6

[2,]    4    7

[3,]    5    8


SET 21

Q1. Write an R program to convert a given matrix to a list and print list in ascending

order.


🧑‍🏫 R Program: Convert Matrix to List & Sort Ascending

# Create a matrix

mat <- matrix(c(8, 3, 5, 1, 9, 2), nrow = 2)

 

# Convert matrix to list

lst <- as.list(mat)

 

# Sort list elements in ascending order

sorted_lst <- sort(unlist(lst))

 

# Display result

print(sorted_lst)


💡 Explanation (Simple)

·        matrix() → creates a matrix

·        as.list(mat) → converts matrix into a list

·        unlist(lst) → converts list to vector (for sorting)

·        sort() → sorts values in ascending order


🎯 Output

[1] 1 2 3 5 8 9


 

SET 22

Q1. Write a R program to create a data frame from four given vectors and

display the structure and statistical summary of a data frame


🧑‍🏫 R Program: Data Frame + Structure + Summary

# Create four vectors

id <- c(1, 2, 3, 4, 5)

name <- c("John", "Amit", "Riya", "Neha", "Rahul")

age <- c(21, 22, 20, 23, 21)

marks <- c(75, 68, 85, 70, 90)

 

# Create data frame

df <- data.frame(id, name, age, marks)

 

# Display data frame

print(df)

 

# Display structure

str(df)

 

# Display statistical summary

summary(df)


💡 Explanation (Simple)

·        data.frame() → combines vectors into table format

·        str(df) → shows structure (type of each column)

·        summary(df) → shows statistics (min, max, mean, etc.)


🎯 Output (Example)

Structure:

'data.frame': 5 obs. of  4 variables:

 $ id   : num  1 2 3 4 5

 $ name : chr  "John" "Amit" ...

 $ age  : num  21 22 20 23 21

 $ marks: num  75 68 85 70 90

Summary:

 id        age        marks

 Min.   :1   Min.   :20   Min.   :68

 Max.   :5   Max.   :23   Max.   :90

 Mean   :3   Mean   :21.4 Mean   :77.6


SET 23

Q1. Write a R program to create inner, outer, left, right join(merge) from given two

data frames.


🧑‍🏫 R Program: Joins (Merge)

# Create first data frame

df1 <- data.frame(

  id = c(1, 2, 3, 4),

  name = c("Amit", "John", "Riya", "Neha")

)

 

# Create second data frame

df2 <- data.frame(

  id = c(2, 3, 4, 5),

  marks = c(80, 90, 85, 70)

)

 

# Inner Join

inner_join <- merge(df1, df2, by = "id")

 

# Left Join

left_join <- merge(df1, df2, by = "id", all.x = TRUE)

 

# Right Join

right_join <- merge(df1, df2, by = "id", all.y = TRUE)

 

# Full Outer Join

outer_join <- merge(df1, df2, by = "id", all = TRUE)

 

# Display results

print("Inner Join:")

print(inner_join)

 

print("Left Join:")

print(left_join)

 

print("Right Join:")

print(right_join)

 

print("Outer Join:")

print(outer_join)


💡 Explanation (Simple)

·        merge() → used to join data frames

·        by = "id" → common column

·        all.x = TRUE → Left Join

·        all.y = TRUE → Right Join

·        all = TRUE → Full Outer Join


🎯 Output (Example)

Inner Join

  id name marks

1  2 John    80

2  3 Riya    90

3  4 Neha    85

Left Join

  id name marks

1  1 Amit   NA

2  2 John   80

3  3 Riya   90

4  4 Neha   85

Right Join

  id name marks

1  2 John   80

2  3 Riya   90

3  4 Neha   85

4  5 <NA>   70

Outer Join

  id name marks

1  1 Amit   NA

2  2 John   80

3  3 Riya   90

4  4 Neha   85

5  5 <NA>   70


 

SET 24

Q1. Using the inbuilt mtcar dataset perform the following

a. Display all the cars having mpg more than 20

b. Subset the dataset by mpg column for values greater than 15.0


🧑‍🏫 Using Dataset: mtcars

# Load dataset

data(mtcars)


(a) Display cars having mpg > 20

# Filter cars with mpg greater than 20

high_mpg <- mtcars[mtcars$mpg > 20, ]

 

# Display result

print(high_mpg)


💡 Explanation

·        mtcars$mpg > 20 → condition

·        [...] → filters rows

·        Displays all cars with mileage greater than 20


(b) Subset dataset by mpg column (mpg > 15)

# Subset only mpg column for values > 15

mpg_subset <- mtcars$mpg[mtcars$mpg > 15]

 

# Display result

print(mpg_subset)


 

 

SET 25

Q1. Using the inbuilt air quality dataset perform the following

 

a. Subset the dataset for the month July having Wind value greater than

10

b. Find the number of days having temperature less than 60

H


🧑‍🏫 Using Dataset: airquality

# Load dataset

data(airquality)


(a) Subset for July (Month = 7) & Wind > 10

# Subset data

july_data <- airquality[airquality$Month == 7 & airquality$Wind > 10, ]

 

# Display result

print(july_data)


💡 Explanation

·        Month == 7 → selects July data

·        Wind > 10 → selects rows with wind greater than 10

·        & → AND condition


(b) Number of Days with Temperature < 60

# Count days

count_days <- sum(airquality$Temp < 60, na.rm = TRUE)

 

# Display result

print(count_days)


💡 Explanation

·        Temp < 60 → condition

·        sum() → counts TRUE values

·        na.rm = TRUE → ignores missing values


🎯 Final Understanding

·        Subset → filter dataset using conditions

·        Count → use sum() on logical condition


 

 

SET 26

Write an R program to draw an empty plot and an empty plot specifies the axes

limits of the graphic.


🧑‍🏫 R Program: Empty Plot with Axes Limits

# Create an empty plot with defined axis limits

plot(1, type = "n",

     xlim = c(0, 10),

     ylim = c(0, 20),

     xlab = "X-Axis",

     ylab = "Y-Axis",

     main = "Empty Plot with Axis Limits")


💡 Explanation (Simple)

·        type = "n" → creates an empty plot (no points drawn)

·        xlim = c(0,10) → sets X-axis limits

·        ylim = c(0,20) → sets Y-axis limits

·        plot(1, ...) → initializes the plot


🎯 Output

👉 A blank graph will appear with:

·        X-axis from 0 to 10

·        Y-axis from 0 to 20

·        No data points (empty plot)


 

 

SET 27

Q1. Using inbuilt mtcars dataset

 

a) Create a bar plot for attribute mpg for all cars having 3 gears

b) Create a Histogram to show number of cars per carburetor type whose

mpg is greater than 20


🧑‍🏫 Using Dataset: mtcars

# Load dataset

data(mtcars)


(a) Bar Plot: mpg for Cars with 3 Gears

# Filter cars with 3 gears

gear3 <- mtcars[mtcars$gear == 3, ]

 

# Create bar plot

barplot(gear3$mpg,

        main = "MPG of Cars with 3 Gears",

        xlab = "Cars",

        ylab = "MPG",

        col = "orange")


💡 Explanation

·        mtcars$gear == 3 → selects cars with 3 gears

·        barplot() → displays mpg values


(b) Histogram: Cars per Carburetor Type (mpg > 20)

# Filter cars with mpg > 20

filtered <- mtcars[mtcars$mpg > 20, ]

 

# Create histogram of carburetor types

hist(filtered$carb,

     main = "Cars per Carburetor Type (MPG > 20)",

     xlab = "Carburetor Type",

     col = "lightblue")


💡 Explanation

·        mpg > 20 → selects efficient cars

·        carb → carburetor type

·        hist() → shows distribution


🎯 Final Understanding

·        Bar plot → mpg values for 3-gear cars

·        Histogram → distribution of carburetor types for cars with mpg > 20



SET 28

 

Q1. Using air quality dataset

 

a) Create a scatter plot to show the relationship between

ozone and wind values by giving appropriate value to color

argument

b) Create a bar plot to show the ozone level for all the days

having temperature greater than 70


🧑‍🏫 Using Dataset: airquality

# Load dataset

data(airquality)


(a) Scatter Plot: Ozone vs Wind (with color)

plot(airquality$Wind, airquality$Ozone,

     main = "Ozone vs Wind",

     xlab = "Wind",

     ylab = "Ozone",

     col = "red",

     pch = 19)


💡 Explanation

·        Wind → X-axis

·        Ozone → Y-axis

·        col = "red" → sets color of points

·        pch = 19 → solid dots


(b) Bar Plot: Ozone Level (Temp > 70)

# Filter data where temperature > 70

filtered_data <- airquality[airquality$Temp > 70, ]

 

# Create bar plot

barplot(filtered_data$Ozone,

        main = "Ozone Levels (Temp > 70)",

        xlab = "Days",

        ylab = "Ozone",

        col = "blue")


💡 Explanation

·        Temp > 70 → selects hot days

·        barplot() → shows ozone levels for those days


🎯 Final Understanding

·        Scatter plot → shows relationship between wind and ozone

·        Bar plot → shows ozone levels on hotter days


SET 29

Q1. Using inbuilt mtcars dataset

 

a. Create a bar plot that shows the number of cars of each gear type.

b. Draw a scatter plot showing the relationship between wt and mpg for

all the cars having 4 gears


🧑‍🏫 Using Inbuilt Dataset mtcars

# Load dataset

data(mtcars)


(a) Bar Plot: Number of Cars for Each Gear Type

# Count number of cars for each gear

gear_count <- table(mtcars$gear)

 

# Create bar plot

barplot(gear_count,

        main = "Number of Cars by Gear Type",

        xlab = "Gears",

        ylab = "Number of Cars",

        col = "lightgreen")


💡 Explanation

·        table(mtcars$gear) → counts cars for each gear

·        barplot() → creates bar graph


(b) Scatter Plot: wt vs mpg (Only 4 Gears)

# Filter cars with 4 gears

gear4 <- mtcars[mtcars$gear == 4, ]

 

# Create scatter plot

plot(gear4$wt, gear4$mpg,

     main = "Weight vs MPG (4 Gear Cars)",

     xlab = "Weight (wt)",

     ylab = "MPG",

     pch = 19,

     col = "blue")


💡 Explanation

·        mtcars$gear == 4 → selects only 4 gear cars

·        plot(x, y) → scatter plot

·        wt vs mpg → shows relationship


🎯 Final Understanding

·        Bar plot → shows count of cars by gear

·        Scatter plot → shows relationship between weight and mileage


SET 30

                                    

Q1. Draw boxplot to show the distribution of mpg values per number of gears


🧑‍🏫 R Program: Boxplot (mpg vs gears)

# Use built-in dataset

data(mtcars)

 

# Create boxplot

boxplot(mpg ~ gear, data = mtcars,

        main = "MPG Distribution by Number of Gears",

        xlab = "Number of Gears",

        ylab = "Miles Per Gallon (mpg)",

        col = "lightblue")


💡 Explanation (Simple)

  • mtcars → built-in dataset in R
  • mpg ~ gear → compares mpg with number of gears
  • boxplot() → creates boxplot
  • col → adds color for better visualization

🎯 Output

👉 A boxplot showing:

  • X-axis → number of gears (3, 4, 5)
  • Y-axis → mpg values
  • Each box shows distribution (min, Q1, median, Q3, max)


Post a Comment

0 Comments