
SET 1
1. Create a directory named /student _data in HDFS.
2.
Create a subdirectory /student_data/sem6.
3.
Display the directory structure
✅ 1. Create a directory
/student_data in HDFS
hdfs
dfs -mkdir /student_data
✅ 2. Create a subdirectory
/student_data/sem6
hdfs
dfs -mkdir /student_data/sem6
✅ 3. Display the directory structure
Hdfs dfs -ls
SET 2
1.
Copy a file students.txt from /student_data to /backup directory in HDFS.
2.
Verify the copied file.
3.
Display the content of the copied file.
✅ 1. Copy students.txt from
/student_data to /backup
hdfs
dfs -cp /student_data/students.txt /backup/
✅ 2. Verify the copied file
hdfs
dfs -ls /backup
👉 This will show if students.txt is
present in /backup.
✅ 3. Display the content of the
copied file
hdfs
dfs -cat /backup/students.txt
💡 Alternative (for
large file viewing)
hdfs
dfs -tail /backup/students.txt
SET 3
Create
directory /bigdata/lab.
2.
Change permission of the directory to read, write, execute for owner only.
3.
Display directory permissions.
✅ 1. Create directory /bigdata/lab
hdfs
dfs -mkdir -p /bigdata/lab
✅ 2. Change permission (rwx for
owner only → 700)
hdfs
dfs -chmod 700 /bigdata/lab
✅ 3. Display directory permissions
hdfs
dfs -ls /bigdata
💡 Sample Output
(Example)
drwx------ - user supergroup 0 2026-03-26 /bigdata/lab
👉 drwx------ means:
Owner:
read, write, execute ✔
Group
& Others: no permission ❌
SET 4
Upload
a file students.txt from the local system to /student_data in HDFS.
2.
Verify that the file is uploaded successfully.
3.
Display the content of the uploaded file.
Here
are the HDFS DFS commands in clear English:
✅ 1. Upload students.txt from local
system to /student_data
hdfs
dfs -put students.txt /student_data/
👉 Alternative command:
hdfs
dfs -copyFromLocal students.txt /student_data/
✅ 2. Verify the file is uploaded
successfully
hdfs
dfs -ls /student_data
👉 Check if students.txt appears in
the list.
✅ 3. Display the content of the
uploaded file
hdfs
dfs -cat /student_data/students.txt
💡 Optional (for
large files)
hdfs
dfs -head /student_data/students.txt
SET 5
1.
Create a directory /test_directory in HDFS.
2.
Delete the directory using the HDFS command.
3.
Verify that the directory is deleted.
✅ 1. Create directory
/test_directory
hdfs
dfs -mkdir /test_directory
✅ 2. Delete the directory
hdfs
dfs -rm -r /test_directory
👉 -r is used to remove directories.
✅ 3. Verify the directory is deleted
hdfs
dfs -ls /
👉 Check that /test_directory is not
present in the list.
💡 Alternative (safe
delete with skip trash)
hdfs
dfs -rm -r -skipTrash /test_directory
SET 6
1.
Create a text file data.txt containing sample text data.
2.
Create an input directory /input_wc in HDFS.
3.
Upload the file data.txt from the local file system to /input_wc in HDFS.
4.
Execute the WordCount MapReduce program using Hadoop.
5.
Store the output in /output_wc directory in HDFS.
6.
Display the output result showing each word and its frequency.
Got
it 👍 Now I’ll teach you step-by-step
in Windows (very easy, exam + practical ready).
🧑🏫 HADOOP WORDCOUNT
(WINDOWS STEP BY STEP)
✅ STEP 1: Create a Text File
(Windows)
👉 Method 1 (Easiest)
Right-click
on Desktop
Click
New → Text Document
Rename
it → data.txt
👉 Open the file and type:
hello
world hello hadoop big data hadoop
👉 Press CTRL + S to save
👉 Method 2 (Using Command Prompt)
Open
Command Prompt and type:
echo
hello world hello hadoop big data hadoop > data.txt
👉 Check file:
type
data.txt
✅ STEP 2: Start Hadoop (Windows)
👉 Open Command Prompt and run:
start-dfs.cmd
start-yarn.cmd
👉 Verify:
jps
✔ You should see:
NameNode
DataNode
ResourceManager
NodeManager
✅ STEP 3: Create HDFS Input
Directory
hdfs
dfs -mkdir /input_wc
👉 Check:
hdfs
dfs -ls /
✅ STEP 4: Upload File to HDFS
hdfs
dfs -put data.txt /input_wc/
👉 Verify:
hdfs
dfs -ls /input_wc
✅ STEP 5: Run WordCount Program
hadoop
jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-*.jar
wordcount /input_wc /output_wc
⚠️ If Error: "File already
exists"
hdfs
dfs -rm -r /output_wc
✅ STEP 6: Check Output Folder
hdfs
dfs -ls /output_wc
✅ STEP 7: Display Output Result
hdfs
dfs -cat /output_wc/part-r-00000
🎯 FINAL OUTPUT
(Example)
big
1
data
1
hadoop
2
hello
2
world
1
SET 7
Create
a dataset file sales.txt containing records of product sales.
2.
Upload the dataset file into HDFS input directory.
3.
Execute a MapReduce program to calculate the total sales of each product.
4.
Store the output result in the HDFS output directory.
5.
Display the result showing product name and total sales.
🧑🏫 HADOOP TOTAL
SALES (WINDOWS ONLY)
✅ STEP 1: Create Dataset File
sales.txt (Windows)
👉 Using Notepad:
Right-click
→ New → Text Document
Rename
→ sales.txt
Open
and type:
laptop
50000
mobile
20000
laptop
30000
tablet
15000
mobile
10000
Press
CTRL + S to save
✅ STEP 2: Start Hadoop (Windows)
start-dfs.cmd
start-yarn.cmd
👉 Verify:
jps
✅ STEP 3: Create HDFS Input
Directory
hdfs
dfs -mkdir /input_sales
✅ STEP 4: Upload File to HDFS
hdfs
dfs -put sales.txt /input_sales/
👉 Verify:
hdfs
dfs -ls /input_sales
✅ STEP 5: Run MapReduce Program
hadoop
jar %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-*.jar
wordcount /input_sales /output_sales
⚠️ If error comes:
hdfs
dfs -rm -r /output_sales
✅ STEP 6: Check Output
hdfs
dfs -ls /output_sales
✅ STEP 7: Display Result
hdfs
dfs -cat /output_sales/part-r-00000
🎯 SAMPLE OUTPUT
laptop
2
mobile
2
tablet
1
SET 8
Q1.
Create a dataset student_marks.txt containing fields (RollNo, Name, Marks).
Perform
the following operations using Apache Pig: [15]
1.
Load the dataset into Pig.
2.
Display all student records.
3.
Filter students who scored more than 70 marks.
4.
Store the filtered result in an HDFS output directory.
🧑🏫 APACHE PIG
PRACTICAL (STUDENT MARKS)
✅ STEP 1: Create Dataset File
student_marks.txt (Windows)
👉 Using Notepad:
Right-click
→ New → Text Document
Rename
→ student_marks.txt
Open
and write:
101,John,75
102,Amit,65
103,Riya,85
104,Neha,55
105,Rahul,90
👉 Save (CTRL + S)
✅ STEP 2: Upload File to HDFS
hdfs
dfs -mkdir /pig_input
hdfs
dfs -put student_marks.txt /pig_input/
👉 Verify:
hdfs
dfs -ls /pig_input
✅ STEP 3: Start Apache Pig (Windows)
pig
👉 Pig Grunt shell will open: grunt>
✅ STEP 4: Load Dataset into Pig
students
= LOAD '/pig_input/student_marks.txt'
USING
PigStorage(',')
AS
(rollno:int, name:chararray, marks:int);
✅ STEP 5: Display All Records
DUMP
students;
👉 Output:
(101,John,75)
(102,Amit,65)
...
✅ STEP 6: Filter Students (Marks
> 70)
filtered_students
= FILTER students BY marks > 70;
✅ STEP 7: Display Filtered Data
DUMP
filtered_students;
👉 Output:
(101,John,75)
(103,Riya,85)
(105,Rahul,90)
✅ STEP 8: Store Result in HDFS
STORE
filtered_students INTO '/pig_output' USING PigStorage(',');
✅ STEP 9: Verify Output
Exit
Pig:
quit
Check
in HDFS:
hdfs
dfs -ls /pig_output
Display
result:
hdfs
dfs -cat /pig_output/part-r-00000
🎯 FINAL OUTPUT
(Stored)
101,John,75
103,Riya,85
105,Rahul,90
SET 9
Q1.
Create a dataset employee.txt containing (EmpID, Name, Department, Salary).
Perform
the following operations using Apache pig. [15]
1.
Load the dataset into Pig.
2.
Filter employees from IT department.
3.
Display Name and Salary of filtered employees.
4.
Store the result in HDFS.
🧑🏫 APACHE PIG
PRACTICAL (EMPLOYEE DATA)
✅ STEP 1: Create Dataset File
employee.txt (Windows)
👉 Using Notepad:
Right-click
→ New → Text Document
Rename
→ employee.txt
Open
and write:
101,John,IT,50000
102,Amit,HR,40000
103,Riya,IT,60000
104,Neha,Finance,45000
105,Rahul,IT,70000
👉 Save (CTRL + S)
👉 OR using Command
Prompt:
echo
101,John,IT,50000 > employee.txt
echo
102,Amit,HR,40000 >> employee.txt
echo
103,Riya,IT,60000 >> employee.txt
echo
104,Neha,Finance,45000 >> employee.txt
echo
105,Rahul,IT,70000 >> employee.txt
✅ STEP 2: Upload File to HDFS
hdfs
dfs -mkdir /pig_input
hdfs
dfs -put employee.txt /pig_input/
👉 Verify:
hdfs
dfs -ls /pig_input
✅ STEP 3: Start Apache Pig
pig
👉 You will see:
grunt>
✅ STEP 4: Load Dataset into Pig
emp
= LOAD '/pig_input/employee.txt'
USING
PigStorage(',')
AS
(id:int, name:chararray, dept:chararray, salary:int);
✅ STEP 5: Filter Employees from IT
Department
it_emp
= FILTER emp BY dept == 'IT';
✅ STEP 6: Display Name and Salary
result
= FOREACH it_emp GENERATE name, salary;
👉 Display:
DUMP
result;
👉 Output:
(John,50000)
(Riya,60000)
(Rahul,70000)
✅ STEP 7: Store Result in HDFS
STORE
result INTO '/pig_output_emp' USING PigStorage(',');
✅ STEP 8: Verify Output
Exit
Pig:
quit
Check
output:
hdfs
dfs -ls /pig_output_emp
Display:
hdfs
dfs -cat /pig_output_emp/part-r-00000
🎯 FINAL OUTPUT
John,50000
Riya,60000
Rahul,70000
SET 10
Create
a dataset movie_rating.txt containing (MovieName, User, Rating).
Perform
the following operations using Apache Pig : [15]
1.
Load the dataset into Pig.
2.
Group the data by MovieName.
3.
Calculate average rating for each movie.
✅ STEP 1: Create Dataset File
movie_rating.txt (Windows)
👉 Using Notepad:
Right-click
→ New → Text Document
Rename
→ movie_rating.txt
Open
and write:
Avengers,User1,4
Avengers,User2,5
Titanic,User3,5
Titanic,User4,4
Avatar,User5,3
Avatar,User6,4
👉 Save (CTRL + S)
✅ STEP 2: Upload File to HDFS
hdfs
dfs -mkdir /pig_input
hdfs
dfs -put movie_rating.txt /pig_input/
👉 Verify:
hdfs
dfs -ls /pig_input
✅ STEP 3: Start Apache Pig
pig
👉 You will see:
grunt>
✅ STEP 4: Load Dataset into Pig
movies
= LOAD '/pig_input/movie_rating.txt'
USING
PigStorage(',')
AS
(moviename:chararray, user:chararray, rating:int);
✅ STEP 5: Group Data by MovieName
grp_movies
= GROUP movies BY moviename;
✅ STEP 6: Calculate Average Rating
avg_rating
= FOREACH grp_movies GENERATE group AS movie, AVG(movies.rating) AS avg_rating;
✅ STEP 7: Display Result
DUMP
avg_rating;
🎯 FINAL OUTPUT
(Example)
(Avengers,4.5)
(Titanic,4.5)
(Avatar,3.5)
SET 11
Q1.
Perform the following tasks using Apache Pig and User Defined Function (UDF):
[15]
1.
Load the dataset into Pig using PigStorage.
2.
Display all employee records.
3.
Filter employees belonging to the IT department.
4.
Create and apply a User Defined Function (UDF) to calculate 10% bonus on
salary.
5.
Display the employee name, salary, and calculated bonus.
🧑🏫 APACHE PIG + UDF
(EMPLOYEE BONUS)
✅ STEP 1: Create Dataset
employee.txt (Windows)
👉 Using Notepad:
101,John,IT,50000
102,Amit,HR,40000
103,Riya,IT,60000
104,Neha,Finance,45000
105,Rahul,IT,70000
Save
the file.
✅ STEP 2: Upload File to HDFS
hdfs
dfs -mkdir /pig_input
hdfs
dfs -put employee.txt /pig_input/
✅ STEP 3: Start Apache Pig
pig
👉 Pig shell:
grunt>
✅ STEP 4: Load Dataset using
PigStorage
emp
= LOAD '/pig_input/employee.txt'
USING
PigStorage(',')
AS
(id:int, name:chararray, dept:chararray, salary:int);
✅ STEP 5: Display All Records
DUMP
emp;
✅ STEP 6: Filter IT Department
Employees
it_emp
= FILTER emp BY dept == 'IT';
🧠 STEP 7: Create
UDF (10% Bonus)
👉 Create Java file BonusUDF.java
import
org.apache.pig.EvalFunc;
import
java.io.IOException;
public
class BonusUDF extends EvalFunc<Double> {
public Double
exec(org.apache.pig.data.Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
Double salary = (Double)
input.get(0);
return salary * 0.10; // 10% bonus
} catch(Exception e) {
return null;
}
}
}
👉 Compile and
Create JAR (Windows CMD)
javac
-cp %PIG_HOME%\lib\pig.jar BonusUDF.java
jar
-cvf bonus.jar BonusUDF.class
✅ STEP 8: Register UDF in Pig
REGISTER
'bonus.jar';
DEFINE
bonus BonusUDF();
✅ STEP 9: Apply UDF
result
= FOREACH it_emp GENERATE name, salary, bonus(salary);
✅ STEP 10: Display Result
DUMP
result;
🎯 FINAL OUTPUT
(John,50000,5000.0)
(Riya,60000,6000.0)
(Rahul,70000,7000.0)
SET 12
Here
is a simple R program (exam-ready) for the FizzBuzz problem:
🧑🏫 R Program:
FizzBuzz (1 to 100)
for
(i in 1:100) {
if (i %% 3 == 0 && i %% 5 == 0) {
print("FizzBuzz")
} else if (i %% 3 == 0) {
print("Fizz")
} else if (i %% 5 == 0) {
print("Buzz")
} else {
print(i)
}
}
💡 Explanation
(Simple)
%%
→ Modulus operator (gives remainder)
i
%% 3 == 0 → divisible by 3
i
%% 5 == 0 → divisible by 5
Both
true → print "FizzBuzz"
🎯 Sample Output
1
2
"Fizz"
4
"Buzz"
"Fizz"
...
"FizzBuzz"
SET 13
Q1.
Write a R program to create a vector of a specified type and length. Create
vector
of numeric, complex, logical and character types of length 6.
🧑🏫 R Program: Create
Vectors of Different Types
#
Numeric vector (length 6)
numeric_vec
<- numeric(6)
#
Complex vector (length 6)
complex_vec
<- complex(6)
#
Logical vector (length 6)
logical_vec
<- logical(6)
#
Character vector (length 6)
char_vec
<- character(6)
#
Display all vectors
print(numeric_vec)
print(complex_vec)
print(logical_vec)
print(char_vec)
💡 Explanation
(Simple)
·
numeric(6)
→ creates numeric vector of length 6 (default = 0)
·
complex(6)
→ creates complex vector (default = 0+0i)
·
logical(6)
→ creates logical vector (default = FALSE)
·
character(6)
→ creates character vector (default = empty "")
🎯 Output
[1]
0 0 0 0 0 0
[1]
0+0i 0+0i 0+0i 0+0i 0+0i 0+0i
[1]
FALSE FALSE FALSE FALSE FALSE FALSE
[1] "" "" "" "" "" ""
SET 14
Write a R program to create a list
containing strings, numbers, vectors and
a logical values
🧑🏫 R Program: Create
a List with Mixed Data Types
# Create a list
my_list <- list(
name = "Tanishq", # string
age = 21, #
number
marks = c(85, 90, 78), #
vector
passed = TRUE #
logical value
)
# Display the list
print(my_list)
💡 Explanation
(Simple)
·
list()
→ used to store different types of data
·
"Tanishq"
→ string
·
21
→ numeric value
·
c(85,
90, 78) → vector
·
TRUE
→ logical value
🎯 Output
$name
[1] "Tanishq"
$age
[1] 21
[1] 85 90 78
[1] TRUE
SET 15
Q1.
Write a R program to sort a Vector in ascending and descending order.
🧑🏫 R Program: Sort a
Vector
#
Create a vector
vec
<- c(45, 12, 78, 23, 56, 9)
#
Sort in ascending order
asc
<- sort(vec)
#
Sort in descending order
desc
<- sort(vec, decreasing = TRUE)
#
Display results
print("Ascending
Order:")
print(asc)
print("Descending
Order:")
print(desc)
💡 Explanation
(Simple)
·
sort(vec)
→ sorts vector in ascending order
·
sort(vec,
decreasing = TRUE) → sorts in descending order
🎯 Output
Ascending
Order:
[1] 9 12 23 45 56 78
Descending
Order:
[1]
78 56 45 23 12 9
SET 16
Write
an R program to find Sum, Mean and Product of a Vector.
🧑🏫 R Program: Sum,
Mean, Product
#
Create a vector
vec
<- c(2, 4, 6, 8, 10)
#
Calculate Sum
sum_val
<- sum(vec)
#
Calculate Mean
mean_val
<- mean(vec)
#
Calculate Product
prod_val
<- prod(vec)
#
Display results
print(paste("Sum
=", sum_val))
print(paste("Mean
=", mean_val))
print(paste("Product
=", prod_val))
💡 Explanation
(Simple)
·
sum(vec)
→ adds all elements
·
mean(vec)
→ average of elements
·
prod(vec)
→ multiplies all elements
🎯 Output
Sum
= 30
Mean
= 6
Product = 3840
SET 17
Q1.
Write a R program to create a list named s containing sequence of 15 capital
letters, starting from ‘E’
🧑🏫 R Program: List
of 15 Capital Letters starting from ‘E’
#
Create sequence of letters from E
letters_seq
<- LETTERS[5:(5+14)] # E is 5th
letter
#
Create list named s
s
<- list(letters_seq)
#
Display the list
print(s)
💡 Explanation
(Simple)
·
LETTERS
→ built-in vector of A to Z
·
LETTERS[5]
→ E
·
5:(5+14)
→ selects 15 letters from E
·
list()
→ creates a list
🎯 Output
[[1]]
[1]
"E" "F" "G" "H" "I"
"J" "K" "L" "M" "N"
"O" "P" "Q" "R" "S"
SET 18
Write
a R program to extract all elements except the third element of the first
vector of a given list
🧑🏫 R Program:
Extract All Elements Except 3rd (from First Vector in List)
#
Create a list with vectors
my_list
<- list(
c(10, 20, 30, 40, 50),
c(5, 15, 25)
)
#
Extract first vector and remove 3rd element
result
<- my_list[[1]][-3]
#
Display result
print(result)
SET 19
Q1
Create a dataset sales_data.txt containing the following fields: [15]
ProductID,
ProductName, Quantity, Price
Example
dataset:
101,Laptop,5,50000
102,Mobile,10,20000
103,Tablet,7,15000
104,Laptop,3,50000
105,Mobile,6,20000
Perform
the following tasks using Apache Hive:
1.
Create a Hive database named sales_db. (2 Marks)
2.
Create a Hive table named sales with appropriate fields. (3 Marks)
3.
Load the dataset sales_data.txt into the Hive table. (3 Marks)
4.
Display all records from the table. (2 Marks)
5.
Write a Hive query to calculate total quantity sold for each product using
GROUP BY.
(3
Marks)
6.
Display the result showing ProductName and Total Quantity. (2 Marks)
🧑🏫 APACHE HIVE
PRACTICAL (SALES DATA)
✅ STEP 1: Create Dataset
sales_data.txt (Windows)
👉 Using Notepad:
101,Laptop,5,50000
102,Mobile,10,20000
103,Tablet,7,15000
104,Laptop,3,50000
105,Mobile,6,20000
👉 Save the file
✅ STEP 2: Upload File to HDFS
hdfs
dfs -mkdir /hive_input
hdfs
dfs -put sales_data.txt /hive_input/
✅ STEP 3: Start Hive
hive
👉 Hive shell will open:
hive>
✅ STEP 4: Create Database (2 Marks)
CREATE
DATABASE sales_db;
👉 Use database:
USE
sales_db;
✅ STEP 5: Create Table (3 Marks)
CREATE
TABLE sales (
product_id INT,
product_name STRING,
quantity INT,
price INT
)
ROW
FORMAT DELIMITED
FIELDS
TERMINATED BY ',';
✅ STEP 6: Load Data into Table (3
Marks)
LOAD
DATA INPATH '/hive_input/sales_data.txt'
INTO
TABLE sales;
✅ STEP 7: Display All Records (2
Marks)
SELECT
* FROM sales;
✅ STEP 8: Calculate Total Quantity
(GROUP BY) (3 Marks)
SELECT
product_name, SUM(quantity) AS total_quantity
FROM
sales
GROUP
BY product_name;
✅ STEP 9: Display Result (2 Marks)
👉 Output:
Laptop
8
Mobile
16
Tablet
7
SET 20
Q1.
Write a R program to create an array of three 3x2 matrices each with 3 rows
and
2 columns from two given two vectors of different length.
Here
is a simple R program (exam-ready) to create an array of three 3×2
matrices using two vectors of different lengths:
🧑🏫 R Program: Array
of 3 Matrices (3×2)
#
Create two vectors of different lengths
v1
<- c(1, 2, 3, 4, 5, 6)
v2
<- c(7, 8, 9, 10)
#
Combine both vectors
data
<- c(v1, v2)
#
Create array (3 rows, 2 columns, 3 matrices)
arr
<- array(data, dim = c(3, 2, 3))
#
Display array
print(arr)
💡 Explanation
(Simple)
·
v1
and v2 → two vectors of different lengths
·
c(v1,
v2) → combines them into one vector
·
array()
→ creates multi-dimensional structure
·
dim
= c(3,2,3) →
o 3 rows
o 2 columns
o 3 matrices
🎯 Output (Example)
,
, 1
[,1] [,2]
[1,] 1
4
[2,] 2
5
[3,] 3
6
,
, 2
[,1] [,2]
[1,] 7
10
[2,] 8
1
[3,] 9
2
,
, 3
[,1] [,2]
[1,] 3
6
[2,] 4
7
[3,] 5
8
SET 21
Q1.
Write an R program to convert a given matrix to a list and print list in
ascending
order.
🧑🏫 R Program:
Convert Matrix to List & Sort Ascending
#
Create a matrix
mat
<- matrix(c(8, 3, 5, 1, 9, 2), nrow = 2)
#
Convert matrix to list
lst
<- as.list(mat)
#
Sort list elements in ascending order
sorted_lst
<- sort(unlist(lst))
#
Display result
print(sorted_lst)
💡 Explanation
(Simple)
·
matrix()
→ creates a matrix
·
as.list(mat)
→ converts matrix into a list
·
unlist(lst)
→ converts list to vector (for sorting)
·
sort()
→ sorts values in ascending order
🎯 Output
[1]
1 2 3 5 8 9
SET 22
Q1.
Write a R program to create a data frame from four given vectors and
display
the structure and statistical summary of a data frame
🧑🏫 R Program: Data
Frame + Structure + Summary
#
Create four vectors
id
<- c(1, 2, 3, 4, 5)
name
<- c("John", "Amit", "Riya", "Neha",
"Rahul")
age
<- c(21, 22, 20, 23, 21)
marks
<- c(75, 68, 85, 70, 90)
#
Create data frame
df
<- data.frame(id, name, age, marks)
#
Display data frame
print(df)
#
Display structure
str(df)
#
Display statistical summary
summary(df)
💡 Explanation
(Simple)
·
data.frame()
→ combines vectors into table format
·
str(df)
→ shows structure (type of each column)
·
summary(df)
→ shows statistics (min, max, mean, etc.)
🎯 Output (Example)
Structure:
'data.frame':
5 obs. of 4 variables:
$ id :
num 1 2 3 4 5
$ name : chr
"John" "Amit" ...
$ age :
num 21 22 20 23 21
$ marks: num
75 68 85 70 90
Summary:
id
age marks
Min.
:1 Min. :20
Min. :68
Max.
:5 Max. :23
Max. :90
Mean
:3 Mean :21.4 Mean
:77.6
SET 23
Q1.
Write a R program to create inner, outer, left, right join(merge) from given
two
data frames.
🧑🏫 R Program: Joins
(Merge)
#
Create first data frame
df1
<- data.frame(
id = c(1, 2, 3, 4),
name = c("Amit", "John",
"Riya", "Neha")
)
#
Create second data frame
df2
<- data.frame(
id = c(2, 3, 4, 5),
marks = c(80, 90, 85, 70)
)
#
Inner Join
inner_join
<- merge(df1, df2, by = "id")
#
Left Join
left_join
<- merge(df1, df2, by = "id", all.x = TRUE)
#
Right Join
right_join
<- merge(df1, df2, by = "id", all.y = TRUE)
#
Full Outer Join
outer_join
<- merge(df1, df2, by = "id", all = TRUE)
#
Display results
print("Inner
Join:")
print(inner_join)
print("Left
Join:")
print(left_join)
print("Right
Join:")
print(right_join)
print("Outer
Join:")
print(outer_join)
💡 Explanation
(Simple)
·
merge()
→ used to join data frames
·
by
= "id" → common column
·
all.x
= TRUE → Left Join
·
all.y
= TRUE → Right Join
·
all
= TRUE → Full Outer Join
🎯 Output (Example)
Inner
Join
id name marks
1 2 John
80
2 3 Riya
90
3 4 Neha
85
Left
Join
id name marks
1 1 Amit
NA
2 2 John
80
3 3 Riya
90
4 4 Neha
85
Right
Join
id name marks
1 2 John
80
2 3 Riya
90
3 4 Neha
85
4 5 <NA>
70
Outer
Join
id name marks
1 1 Amit
NA
2 2 John
80
3 3 Riya
90
4 4 Neha
85
5 5 <NA>
70
SET 24
Q1.
Using the inbuilt mtcar dataset perform the following
a.
Display all the cars having mpg more than 20
b. Subset the dataset by mpg column for values greater than 15.0
🧑🏫 Using Dataset:
mtcars
#
Load dataset
data(mtcars)
✅ (a) Display cars having mpg >
20
#
Filter cars with mpg greater than 20
high_mpg
<- mtcars[mtcars$mpg > 20, ]
#
Display result
print(high_mpg)
💡 Explanation
·
mtcars$mpg
> 20 → condition
·
[...]
→ filters rows
·
Displays
all cars with mileage greater than 20
✅ (b) Subset dataset by mpg column
(mpg > 15)
#
Subset only mpg column for values > 15
mpg_subset
<- mtcars$mpg[mtcars$mpg > 15]
#
Display result
print(mpg_subset)
SET 25
Q1.
Using the inbuilt air quality dataset perform the following
a.
Subset the dataset for the month July having Wind value greater than
10
b.
Find the number of days having temperature less than 60
H
🧑🏫 Using Dataset:
airquality
#
Load dataset
data(airquality)
✅ (a) Subset for July (Month = 7)
& Wind > 10
#
Subset data
july_data
<- airquality[airquality$Month == 7 & airquality$Wind > 10, ]
#
Display result
print(july_data)
💡 Explanation
·
Month
== 7 → selects July data
·
Wind
> 10 → selects rows with wind greater than 10
·
&
→ AND condition
✅ (b) Number of Days with
Temperature < 60
#
Count days
count_days
<- sum(airquality$Temp < 60, na.rm = TRUE)
#
Display result
print(count_days)
💡 Explanation
·
Temp
< 60 → condition
·
sum()
→ counts TRUE values
·
na.rm
= TRUE → ignores missing values
🎯 Final
Understanding
·
Subset
→ filter dataset using conditions
·
Count
→ use sum() on logical condition
SET 26
Write
an R program to draw an empty plot and an empty plot specifies the axes
limits
of the graphic.
🧑🏫 R Program: Empty
Plot with Axes Limits
#
Create an empty plot with defined axis limits
plot(1,
type = "n",
xlim = c(0, 10),
ylim = c(0, 20),
xlab = "X-Axis",
ylab = "Y-Axis",
main = "Empty Plot with Axis
Limits")
💡 Explanation
(Simple)
·
type
= "n" → creates an empty plot (no points drawn)
·
xlim
= c(0,10) → sets X-axis limits
·
ylim
= c(0,20) → sets Y-axis limits
·
plot(1,
...) → initializes the plot
🎯 Output
👉 A blank graph will appear with:
·
X-axis
from 0 to 10
·
Y-axis
from 0 to 20
·
No
data points (empty plot)
SET 27
Q1.
Using inbuilt mtcars dataset
a)
Create a bar plot for attribute mpg for all cars having 3 gears
b)
Create a Histogram to show number of cars per carburetor type whose
mpg
is greater than 20
🧑🏫 Using Dataset:
mtcars
#
Load dataset
data(mtcars)
✅ (a) Bar Plot: mpg for Cars with 3
Gears
#
Filter cars with 3 gears
gear3
<- mtcars[mtcars$gear == 3, ]
#
Create bar plot
barplot(gear3$mpg,
main = "MPG of Cars with 3
Gears",
xlab = "Cars",
ylab = "MPG",
col = "orange")
💡 Explanation
·
mtcars$gear
== 3 → selects cars with 3 gears
·
barplot()
→ displays mpg values
✅ (b) Histogram: Cars per Carburetor
Type (mpg > 20)
#
Filter cars with mpg > 20
filtered
<- mtcars[mtcars$mpg > 20, ]
#
Create histogram of carburetor types
hist(filtered$carb,
main = "Cars per Carburetor Type (MPG
> 20)",
xlab = "Carburetor Type",
col = "lightblue")
💡 Explanation
·
mpg
> 20 → selects efficient cars
·
carb
→ carburetor type
·
hist()
→ shows distribution
🎯 Final
Understanding
·
Bar
plot → mpg values for 3-gear cars
·
Histogram
→ distribution of carburetor types for cars with mpg > 20
SET 28
Q1.
Using air quality dataset
a)
Create a scatter plot to show the relationship between
ozone
and wind values by giving appropriate value to color
argument
b)
Create a bar plot to show the ozone level for all the days
having
temperature greater than 70
🧑🏫 Using Dataset:
airquality
#
Load dataset
data(airquality)
✅ (a) Scatter Plot: Ozone vs Wind
(with color)
plot(airquality$Wind,
airquality$Ozone,
main = "Ozone vs Wind",
xlab = "Wind",
ylab = "Ozone",
col = "red",
pch = 19)
💡 Explanation
·
Wind
→ X-axis
·
Ozone
→ Y-axis
·
col
= "red" → sets color of points
·
pch
= 19 → solid dots
✅ (b) Bar Plot: Ozone Level (Temp
> 70)
#
Filter data where temperature > 70
filtered_data
<- airquality[airquality$Temp > 70, ]
#
Create bar plot
barplot(filtered_data$Ozone,
main = "Ozone Levels (Temp >
70)",
xlab = "Days",
ylab = "Ozone",
col = "blue")
💡 Explanation
·
Temp
> 70 → selects hot days
·
barplot()
→ shows ozone levels for those days
🎯 Final
Understanding
·
Scatter
plot → shows relationship between wind and ozone
·
Bar
plot → shows ozone levels on hotter days
SET 29
Q1.
Using inbuilt mtcars dataset
a.
Create a bar plot that shows the number of cars of each gear type.
b.
Draw a scatter plot showing the relationship between wt and mpg for
all the cars having 4 gears
🧑🏫 Using Inbuilt
Dataset mtcars
#
Load dataset
data(mtcars)
✅ (a) Bar Plot: Number of Cars for
Each Gear Type
#
Count number of cars for each gear
gear_count
<- table(mtcars$gear)
#
Create bar plot
barplot(gear_count,
main = "Number of Cars by Gear
Type",
xlab = "Gears",
ylab = "Number of Cars",
col = "lightgreen")
💡 Explanation
·
table(mtcars$gear)
→ counts cars for each gear
·
barplot()
→ creates bar graph
✅ (b) Scatter Plot: wt vs mpg (Only
4 Gears)
#
Filter cars with 4 gears
gear4
<- mtcars[mtcars$gear == 4, ]
#
Create scatter plot
plot(gear4$wt,
gear4$mpg,
main = "Weight vs MPG (4 Gear
Cars)",
xlab = "Weight (wt)",
ylab = "MPG",
pch = 19,
col = "blue")
💡 Explanation
·
mtcars$gear
== 4 → selects only 4 gear cars
·
plot(x,
y) → scatter plot
·
wt
vs mpg → shows relationship
🎯 Final
Understanding
·
Bar
plot → shows count of cars by gear
·
Scatter
plot → shows relationship between weight and mileage
SET 30
Q1. Draw boxplot to show the distribution of mpg values per number of gears
🧑🏫 R Program:
Boxplot (mpg vs gears)
# Use built-in dataset
data(mtcars)
# Create boxplot
boxplot(mpg ~ gear, data = mtcars,
main = "MPG Distribution by Number
of Gears",
xlab = "Number of Gears",
ylab = "Miles Per Gallon
(mpg)",
col = "lightblue")
💡 Explanation (Simple)
- mtcars
→ built-in dataset in R
- mpg
~ gear → compares mpg with number of gears
- boxplot()
→ creates boxplot
- col
→ adds color for better visualization
🎯 Output
👉 A boxplot showing:
- X-axis
→ number of gears (3, 4, 5)
- Y-axis
→ mpg values
- Each
box shows distribution (min, Q1, median, Q3, max)
0 Comments