Essential Concepts of AI & ML
📢 Never Miss Free Notes & Courses — Join Our Learning Community!
Artificial Intelligence
1. What is Artificial Intelligence (AI)?
Definition:
Artificial Intelligence is the simulation of human intelligence in machines. These machines are programmed to think, learn, and solve problems—just like we do!
Cool Fact:
The term Artificial Intelligence was coined in 1956 at a conference at Dartmouth College. The idea? To create machines that can think.
AI ≠ Magic. It’s just complex math + smart logic!
2. A Quick Peek into the History of AI
| Era | Highlights |
| 1950s | Alan Turing proposes the Turing Test to measure machine intelligence. |
| 1956 | The term AI is born (John McCarthy, Dartmouth Conference). |
| 1960s-70s | AI systems play checkers, solve algebra, and prove theorems. |
| 1980s | Rise of Expert Systems (rule-based). |
| 2000s | Machine Learning becomes the trend. |
| 2010s–Now | Deep Learning, Self-driving cars, Chatbots, AI in healthcare. |
Quick Quiz 1:
- Who is called the “Father of Artificial Intelligence”?
A) Alan Turing
B) John McCarthy
C) Andrew Ng
D) Geoffrey Hinton
>Answer: B) John McCarthy
3. AI vs ML vs Deep Learning – What’s the Difference?
| Feature | AI | Machine Learning (ML) | Deep Learning (DL) |
| Definition | Simulates human intelligence | AI that learns from data | ML with neural networks |
| Example | Chatbots | Email spam filter | Face recognition |
| Dependency | Rules & logic | Data patterns | Big data & GPUs |
| Focus | Smart behavior | Learning from data | High accuracy & automation |
Analogy:
Think of AI as the brain, ML as the learning part of the brain, and DL as the deep, complex thinking.
Quick Quiz 2:
- Which of the following is a subset of Machine Learning?
A) AI
B) Deep Learning
C) Robotics
D) IoT
>Answer: B) Deep Learning
4. Applications of AI in Real Life
| Sector | Use Case |
| Healthcare | Disease diagnosis, drug discovery |
| Transport | Self-driving cars, traffic prediction |
| Business | Chatbots, customer behavior prediction |
| Technology | Voice assistants (Siri, Alexa) |
| Gaming | NPC behavior, real-time learning |
| Agriculture | Crop monitoring, disease detection |
| Education | Smart tutors, personalized learning paths |
Did You Know?
AI can detect diseases like cancer earlier than doctors in some cases!
5. Agents and Environments
Agent:
An agent is anything that can perceive its environment and take actions to achieve goals.
Example: A robot vacuum detects dirt (perception) and moves to clean it (action).
Environment:
The surroundings where the agent operates. For the robot vacuum, the room is the environment.
6. Types of Agents
| Type | Description | Example |
| Simple Reflex Agent | Acts only on current input. | Thermostat |
| Model-Based Agent | Uses memory/history + input. | Robot vacuum |
| Goal-Based Agent | Acts to achieve specific goal. | Path-finding robot |
| Utility-Based Agent | Acts to maximize happiness (utility). | Self-driving car choosing safest route |
Thought Bubble:
Which agent would a drone delivering packages be? (Hint: It should avoid birds, fly safely, and reach the destination.)
Quick Quiz 3:
- Which type of agent uses both current state and goal to make decisions?
A) Simple Reflex
B) Model-Based
C) Goal-Based
D) Utility-Based
>Answer: C) Goal-Based
7. Agent Architecture
Agent architecture defines how an agent is built.
Types include:
- Reactive Architecture – Instant response.
- Deliberative Architecture – Thinks, then acts.
- Hybrid Architecture – Combines both.
Wrap-up Activity: Match the Agent!
Match the AI system to the correct agent type.
| AI System | Agent Type |
| a) Google Maps | 1) Utility-Based |
| b) Smart AC | 2) Simple Reflex |
| c) Delivery Drone | 3) Goal-Based |
| d) ChatGPT | 4) Model-Based |
Answers:
a → 1
b → 2
c → 3
d → 4
Final Thought:
AI is not just the future—it’s the present. The more we understand how it works, the better we can build, use, and improve it.
Breadth-First Search [BFS]¶
A breadth-first search is when you inspect every node on a level starting at the top of the tree and then move to the next level.
# Example usage:
# Constructing a simple binary tree
# 1
# / \
# 2 3
# / \ / \
# 4 5 6 7
Level 0: Visit the root node 1.Level 1: Visit the children of the root, 2 and 3, from left to right.
Level 2: Visit all the grandchildren, 4, 5, 6, and 7, from left to right.The traversal is managed using a Queue structure.
The final, level-by-level order is:$$\mathbf{1 \to 2 \to 3 \to 4 \to 5 \to 6 \to 7}$$
class TreeNode:
def __init__(self, value=0, left=None, right=None):
self.value = value
self.left = left
self.right = right
root = TreeNode(1)
root.left = TreeNode(2)
root.right = TreeNode(3)
root.left.left = TreeNode(4)
root.left.right = TreeNode(5)
root.right.left = TreeNode(6)
root.right.right = TreeNode(7)
def BFSSearch(root,x):
if not root:
return []
nodes = [root]
visitedNodes = []
while nodes:
element = nodes.pop(0)
visitedNodes.append(element.value)
if x == element.value:
return visitedNodes
if element.left:
nodes.append(element.left)
if element.right:
nodes.append(element.right)
return visitedNodes
def DFSSearch(root,x):
if not root:
return []
stack = [root]
visitedNodes = []
while stack:
element = stack.pop()
visitedNodes.append(element.value)
if x == element.value:
return visitedNodes
if element.right:
stack.append(element.right)
if element.left:
stack.append(element.left)
return
choice = input("Enter your choice as BFS/DFS: ")
if choice == 'BFS':
visitedNodes = BFSSearch(root,5) #list of nodes
if len(visitedNodes) == 0:#None
print("Tree doesn't exist!")
else:
if visitedNodes:
print("The searched node exist with path: ", visitedNodes)
else:
print("x doesn't exist in Tree")
elif choice == 'DFS':
visitedNodes = DFSSearch(root,5) #list of nodes
if len(visitedNodes) == 0:
print("Tree doesn't exist!")
else:
if visitedNodes:
print("The searched node exist with path: ", visitedNodes)
else:
print("x doesn't exist in Tree")
else:
print("Invalid input")
The searched node exist with path: [1, 2, 3, 4, 5]
Depth-First Search¶
A depth-first search is where you search deep into a branch and don’t move to the next one until you’ve reached the end.
# Example usage:
# Constructing a simple binary tree
# 1
# / \
# 2 3
# / \ / \
# 4 5 6 7
DFS explores the tree by going as far as possible down one branch before checking its neighbors. The chosen strategy here is Pre-order (Node $\to$ Left $\to$ Right).
Traversal Order¶
The algorithm prioritizes depth at every step, resulting in the following sequence:
- Start (Root): 1
- Left Branch:
- 2
- 4 (Deepest point on this left sub-branch)
- 5 (Exploring 2’s right sub-branch)
- Right Branch:
- 3
- 6 (Deepest point on this right sub-branch)
- 7 (Exploring 3’s right sub-branch)
Final Path¶
The final, combined DFS Pre-order path is:
$$\mathbf{1 \to 2 \to 4 \to 5 \to 3 \to 6 \to 7}$$
class TreeNode:
def __init__(self, value=0, left=None, right=None):
self.value = value
self.left = left
self.right = right
root = TreeNode(1)
root.left = TreeNode(2)
root.right = TreeNode(3)
root.left.left = TreeNode(4)
root.left.right = TreeNode(5)
root.right.left = TreeNode(6)
root.right.right = TreeNode(7)
def BFSSearch(root, x):
if not root:
return "Tree doesn't exist"
visitedNodes = []
queue = [root]
while queue:
node = queue.pop(0)
visitedNodes.append(node.value)
if x == node.value:
return visitedNodes
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
return
visitedNodes = BFSSearch(root, 5)
if visitedNodes:
print("Visited node found with path: ", end=' ')
print(visitedNodes)
else:
print("Node does not exist")
Visited node found with path: [1, 2, 3, 4, 5]
Knowledge-Based AI
What is Knowledge base?
- AI systems often need to store, represent, and use knowledge base about the world.
- Knowledge base (KB): It is a repository of facts and rules.
- Reasoning engine: It uses the KB to answer required questions, draw inferences, and make decisions.
Real-life Example:
- Google Maps has a knowledge base of roads, traffic conditions, and locations. The reasoning engine infers the shortest or fastest route for the user.
MCQs
Q1.1 Which of the following best describes a knowledge base in AI?
(a) A set of input data
(b) A collection of facts and rules
(c) A type of neural network
(d) A random dataset
Q1.2 Which of the following is NOT a component of a knowledge-based system?
(a) Knowledge base
(b) Inference engine
(c) User interface
(d) Compiler
Q1.3 Which real-life application most closely represents a knowledge-based AI system?
(a) Google Maps route planning
(b) Spotify music recommendations
(c) Amazon product shipping
(d) A digital camera’s autofocus
✅ Answers: Q1.1 → (b), Q1.2 → (d), Q1.3 → (a)
Practice Question
Give two real-world examples of knowledge-based AI systems and explain how their knowledge base helps in reasoning?
Building Knowledge Bases?
- Requires:
- Facts: “Paris is the capital of France.”
- Rules: “If a city is a capital, then it is important.”
- Knowledge bases must be consistent, complete, and up-to-date.
Real-life Example:
- Medical expert systems store disease-symptom relationships to diagnose illnesses.
MCQs
Q3.1 Which is NOT a challenge in building knowledge bases?
(a) Consistency
(b) Completeness
(c) Updating information
(d) Arithmetic operations
Q3.2 Which application uses a large knowledge base for reasoning?
(a) Weather forecasting
(b) Expert medical diagnosis system
(c) Stock market charting tool
(d) Calculator
✅ Answers: Q3.1 → (d), Q3.2 → (b)
Practice Question
What are the challenges in keeping a knowledge base consistent? Give one real-world example?
> In a medical knowledge base, one entry may state “Drug A is safe for pregnant women” while another states “Drug A should not be used during pregnancy.” Such inconsistency can mislead doctors and harm patients, showing the importance of consistency management.
Logic and Reasoning
(a) Propositional Logic
- It deals with true and false statements.
- Uses logical connectives: AND (∧), OR (∨), NOT (¬), IMPLIES (→).
Example:
- “If it rains (P), then the ground is wet (Q).”
- Statement: P→Q
MCQs
Q2.1 Which of the following is a valid propositional statement?
(a) x > 5
(b) “The sun is shining.”
(c) All humans are mortal.
(d) There exists an x such that x² = 4.
Q2.2 If P = “It rains” and Q = “Ground is wet”, then the statement “If it rains, then ground is wet” is represented as:
(a) P∨Q
(b) P∧Q
(c) P→Q
(d) Q→P
✅ Answers: Q2.1 → (b), Q2.2 → (c)
Practice Question
- Represent the following in propositional logic:
- “If the light is on, then the room is bright.”
(b) Predicate Logic (First-Order Logic, FOL)
- It extends propositional logic by including quantifiers and predicates.
- ∃ (exist), ∀ (all)
- Can represent relationships between objects.
Example:
- “All humans are mortal.”
- ∀x Human(x)→Mortal(x)
MCQs
Q2.3 Which of the following statements can only be represented in predicate logic (not propositional)?
(a) “The ground is wet.”
(b) “If it rains, the ground is wet.”
(c) “All humans are mortal.”
(d) “It is sunny.”
Q2.4 The symbol ∃x means:
(a) For all x
(b) There exists an x
(c) Not x
(d) Implies x
✅ Answers: Q2.3 → (c), Q2.4 → (b)
Practice Question
- Write the predicate logic expression for:
- “There exists a student who studies AI.”
(c) Inference in First-Order Logic
- Deriving new facts from known facts using inference rules like Modus Ponens.
Example:
- Rule: “All cats are animals.”
- Fact: “Tom is a cat.”
- Inference: “Tom is an animal.”
MCQs
Q2.5 Inference in first-order logic refers to:
(a) Storing facts in memory
(b) Drawing new conclusions from existing facts
(c) Assigning probabilities to events
(d) Creating datasets
Q2.6 Which inference rule is used in:
- If P → Q and P is true, then Q is true?
(a) Modus Tollens
(b) Modus Ponens
(c) Resolution
(d) Universal Instantiation
✅ Answers: Q2.5 → (b), Q2.6 → (b)
Practice Question
Q1. Given:
- All teachers are educated.
- Amit is a teacher.
- Infer the conclusion using predicate logic.
More Practice Questions (Clickable PDFs):
Uncertainty Handling
Real-world data is often incomplete or noisy, so AI uses probability models instead of exact logic.
Example: Sometimes, patient symptoms may be incomplete or noisy (e.g., fever, headache could mean flu, dengue, or malaria).
(a) Probability Basics
- Definition: Probability measures the likelihood of an event.
- Range: 0≤P(Event)≤1
- Formula:
P(A)=Favorable outcomes/Total outcomes
Examples:
- Tossing a fair coin: P(Head)=0.5
- Rolling a dice: P(Prime)=3/6=0.5
Addition Rule:
- P(A∪B)=P(A)+P(B)−P(A∩B)
Example:
- A card is drawn from a standard 52-card deck.
- Event A: card is a heart → P(A)=13/52
- Event B: card is a king → P(B)=4/52
- Event A ∩ B: card is the king of hearts → P(A∩B)=1/52
So, probability that the card is a heart or a king = 4/13.
Multiplication Rule (if independent events):
- P(A∩B)=P(A)×P(B)
Example:
If P(Rain) = 0.7 and P(Traffic) = 0.4 (independent),
P(Rain∧Traffic)=0.28P
MCQs
Q4.1 Which of the following is a valid probability value?
(a) -0.5
(b) 2
(c) 0.6
(d) 5
Q4.2 A dice is rolled. Probability of getting an even number?
(a) 1/2
(b) 1/3
(c) 2/3
(d) 1/6
Q4.3 If P(Rain) = 0.7, P(Traffic) = 0.4 (independent), then P(Rain ∧ Traffic) = ?
(a) 0.28
(b) 0.3
(c) 0.9
(d) 0.11
Answers: Q4.1 → (c), Q4.2 → (a), Q4.3 → (a)
Practice Questions
- A card is drawn from a deck. What is the probability that it is either a heart or a king?
- A bag contains 3 red and 2 blue balls. Two balls are drawn without replacement. Find the probability that both are red.
(b) Conditional Probability (When events are dependent on each other)
- Probability of event “A” given that event B has already occurred.
- Formula:
P(A∣B) = P(A∩B)/P(B)
Example-1: In a class of 100 students, 70 like apples, 30 like oranges, and 20 like both apples and oranges. What is the probability that a student likes apples, given that the student already likes oranges?
Problem Statement:
In a class of 100 students:
• 70 like apples
• 30 like oranges
• 20 like both apples and oranges
Conditional Probability Formula:
P(A | B) = P(A ∩ B) / P(B)
Step Calculations:
• P(A ∩ B) = 20 / 100 = 0.2
• P(B) = 30 / 100 = 0.3
• P(A | B) = 0.2 / 0.3 = 2/3
Answer: Probability that a student likes apples given they like oranges: 2/3
Example-2: If in a class 60% are boys, 40% girls. 30% of boys wear glasses, 20% of girls wear glasses.
- Pick a student wearing glasses. What is P(he is a boy)?
- P(Boy∩Glasses)=0.18
- P(Glasses)=0.26
- P(Boy∣Glasses)=0.18/0.26=0.692
MCQs
Q4.4 Formula for conditional probability is:
(a) P(A)×P(B)
(b) P(A∪B)/P(B)
(c) P(A∩B)/P(B)
(d) P(A)+P(B)
Q4.5 If P(A) = 0.5, P(B) = 0.4, P(A ∧ B) = 0.2, then P(A|B) = ?
(a) 0.8
(b) 0.5
(c) 0.4
(d) 0.2
Answers: Q4.4 → (c), Q4.5 → (a)
Practice Question
- In a box, 70% bulbs are good, 20% are faulty but repairable, and 10% are broken. If a bulb is tested and found not broken, what is the probability it is good?
(c) Bayesian Networks
- It is nothing but graphical models representing probabilistic relationships among variables. Such variables are called Nodes and such dependencies are called Edges.
- Nodes = variables, Edges = dependencies.
We can show this using a graphical model:
Example-1:
Disease —–> Test Result
- Node Disease (yes/no) influences Test Result (positive/negative).
- This shows dependency: whether you test positive depends on whether you have the disease.
Example-2:

Example-3: Intuitive Example with 10,000 People
- Out of 10,000 people:
- 100 have disease (1%)
- 9,900 don’t
- Test results:
- True positives = 100 × 0.9 = 90
- False negatives = 100 × 0.1 = 10
- False positives = 9,900 × 0.1 = 990
- True negatives = 9,900 × 0.9 = 8,910
Step 1: How many people have the disease?
- Prevalence = 1%
- Out of 10,000 → 10,000×0.01=10010,000 × 0.01 = 10010,000×01=100 have the disease.
- So, 10,000−100=9,90010,000 – 100 = 9,90010,000−100=9,900 do not have the disease.
Step 2: Test accuracy
- The test is 90% accurate.
- If you have the disease, it correctly says “positive” 90% of the time (true positive).
- If you don’t have the disease, it correctly says “negative” 90% of the time (true negative).
- That also means 10% of the time it makes mistakes (false positives or false negatives).
Step 3: Apply this to the groups
- For the 100 sick people:
- 90% detected → 90 true positives
- 10% missed → 10 false negatives
- For the 9,900 healthy people:
- 90% correctly shown as negative → 8,910 true negatives
- 10% wrongly shown as positive → 990 false positives
Bayes Theorem:
P(A | B) = [P(B | A) × P(A)] / P(B)
Where:
• P(A | B) = Probability of event A given B has occurred
• P(B | A) = Probability of event B given A has occurred
• P(A) = Probability of event A
• P(B) = Probability of event B
Example (Defective Item Test):
- A factory machine produces 1% defective items
- Test detects defects with 95% accuracy
- Test wrongly flags good items 2% of the time
Example (Medical Diagnosis):
- 1% of population has a disease. Test detects it with 90% accuracy.
- If a person tests positive, actual probability of having disease is much less than 90% because false positives matter.
Final Answer:
If a person tests positive, the probability they actually have the disease is ~15.4%, not 90%.
MCQs
Q4.6 In Bayesian networks, edges represent:
(a) Probabilistic dependencies
(b) Deterministic rules
(c) Logical contradictions
(d) Utilities
Q4.7 Bayes theorem is used to:
(a) Update probability with new evidence
(b) Calculate maximum utility
(c) Remove uncertainty
(d) Create knowledge bases
Answers: Q4.6 → (a), Q4.7 → (a)
Practice Question
A factory machine produces 1% defective items. A test detects defects with 95% accuracy, but also wrongly flags good items 2% of the time. If an item is marked defective, what is the probability it is actually defective?
Conditional Probability Example – Defective Item Test
Given:
• A factory machine produces 1% defective items
• Test detects defects with 95% accuracy
• Test wrongly flags good items 2% of the time
Formula (Bayes’ Theorem):
P(Defective | Test+) = [P(Test+ | Defective) × P(Defective)] / P(Test+)
Step Calculations:
• P(Test+) = (0.95 × 0.01) + (0.02 × 0.99) = 0.0293
• P(Defective | Test+) = 0.0095 / 0.0293 ≈ 0.325
Answer:
✅ Probability that an item marked defective is actually defective: 32.5%
Interpretation:
- Even though the test is 95% accurate, because defective items are rare, the probability that an item is actually defective given a positive test is only ~32.4%.
- This shows the importance of considering base rates in real-world tests.
Decision Making
(a) Expected Utility
- Decision-making principle: choose the action with the highest expected benefit.
Formula:
EU(action) = Σ [ P(outcome) × Utility(outcome) ]
Problem-1:
- Toss a fair coin:
- Heads → Win ₹100
- Tails → Lose ₹50
- Probability of Heads = 0.5, Tails = 0.5
- Utility = Money won/lost
Problem-2:
You are offered a game:
- Roll a fair 6-sided die.
- If you roll a 6, you win ₹150.
- If you roll 1–5, you lose ₹40.
Task: Calculate the Expected Utility (EU) of playing this game. Based on the result, decide whether it is rational to play.
Step 1: Identify probabilities and outcomes
- P(roll 6) = 1/6 → Outcome = +₹150
- P(roll 1–5) = 5/6 → Outcome = -₹40
Step 2: Apply Expected Utility formula
EU(game) = Σ [ P(outcome) × Utility(outcome) ]
EU = (1/6 × 150) + (5/6 × -40)
EU = 25 – 33.33
EU ≈ -8.33
Step 3: Decision
Expected utility is negative (-₹8.33).
✅ Rational decision: It is not rational to play, because on average you lose money.
MCQs
Q5.1 Expected Utility is computed as:
(a) Sum of all rewards
(b) Probability × Utility summed across outcomes
(c) Difference between maximum and minimum rewards
(d) Reward divided by probability
Q5.2 If P(Win) = 0.4, Reward = 50, P(Lose) = 0.6, Reward = -20, then EU = ?
(a) 10
(b) 8
(c) -5
(d) 14
✅ Answers: Q5.1 → (b), Q5.2 → (b)
Practice Question
A game gives you ₹200 with probability 0.3, ₹50 with probability 0.5, and -₹100 with probability 0.2. Compute Expected Utility.
(b) Markov Decision Processes (MDPs)
- Framework for decision-making under uncertainty.
- Defined by:
- States (S): Possible situations.
- Actions (A): Choices available.
- Transition Model (T): Probability of moving between states.
- Rewards (R): Value received.
Real-life Example:
- Robot vacuum cleaner:
- States: Room dirty/clean.
- Actions: Clean, Move.
- Reward: Higher for clean room.
MCQs
Q5.3 Which component is NOT part of a Markov Decision Process (MDP)?
(a) States
(b) Actions
(c) Rewards
(d) Knowledge base
Q5.4 The Markov property states that:
(a) Future state depends only on the current state- Yes
(b) Future state depends on entire history -Yes
(c) States are independent of actions- No
(d) Rewards are fixed- No
✅ Answers: Q5.3 → (d), Q5.4 → (a)
Practice Question
Explain how an MDP can be used for planning in a self-driving car scenario.
An MDP helps a self-driving car plan by:
- States: road, traffic, speed, signals.
- Actions: accelerate, brake, turn, stop.
- Transitions: model uncertainty (e.g., other cars’ moves).
Rewards: safe driving, reaching destination, penalties for collisions/violations
Hill Climbing Method
- Hill Climbing is an iterative optimization algorithm that starts with an arbitrary solution and moves step by step to a better solution until no further improvement is possible.
- “Always move uphill towards higher value (better solution).”
- It may stop at a local maximum instead of the global maximum.
- Example:
Imagine finding the tallest hill in a foggy region. You start climbing in any direction where the slope increases. You stop when no higher point is nearby, but this might not be the tallest hill in the whole area.
Practice Question
- A fitness function is defined as:
f(x) = -x^2 + 10x
Start from x = 2, take step size = 1, and use hill climbing to reach the maximum. Show steps. - You are given the following 5×5 matrix representing the height of a terrain:
[[1, 2, 3, 2, 1],
[4, 6, 8, 5, 2],
[3, 7, 9, 6, 3],
[2, 4, 7, 5, 1],
[1, 2, 3, 2, 0]]Start from position (0,0) → value 1, and apply Hill Climbing (you can only move up, down, left, or right if the neighbor has a higher value).
– Trace the path until no higher value is found.
– Report the local maximum reached and compare it with the global maximum of the matrix.
MCQs
- Hill Climbing is mainly used for:
a) Classification
b) Optimization
c) Clustering
d) Regression - Which is a drawback of Hill Climbing?
a) Requires large datasets
b) Gets stuck in local maxima
c) Cannot work on optimization problems
d) Too slow for small data - Hill Climbing always moves:
a) Randomly
b) Downhill
c) To a neighbor with better value
d) To the global maximum directlyAnswers: 1 → b, 2 → b, 3 → c
Machine Learning
What is Machine Learning?
Machine Learning (ML) is a subset of Artificial Intelligence where systems learn patterns from data and improve their performance on tasks without being explicitly programmed.
By Tom Mitchell:
“A computer program is said to learn from experience (E) with respect to some tasks (T) and some performance measure (P) if its performance at T improves with experience E.”
Example:
Spam filtering: Model learns from labeled emails (spam/not spam) and predicts new emails.
MCQs
- Machine Learning enables systems to:
a) Execute only fixed instructions
b) Learn and improve from data
c) Work without hardware
d) Store more memory2. Which of the following is NOT an ML application?
a) Image recognition
b) Spam detection
c) File compression
d) Speech recognitionAnswers: 1 → b, 2 → c
Types of Machine Learning

(A) Supervised Learning
It uses labeled data (input + output). It learns from mapping from input → output.
Algorithms: Linear Regression, Decision Trees, SVM.
Example: Predicting house prices for given size, location, and features.
(B) Unsupervised Learning
* It uses unlabeled data (only inputs).
* Such algorithms discover hidden patterns or groups.
* Algorithms: K-Means, Hierarchical Clustering, PCA.
Example: Grouping customers into segments based on shopping habits.
(C) Reinforcement Learning
Agent learns by interacting with environment for +ve and -ve rewards.
Key concepts: Agent, Environment, Actions, Rewards, States.
Goal: Maximize cumulative reward.
Algorithms: 1. Deep Q-Network (DQN) – by DeepMind, 2. Proximal Policy Optimization (PPO) – by OpenAI, 3. AlphaGo / AlphaZero – by DeepMind
- Q-Learning → Value-based, learns the best action for each state (table or function).
- Deep Q-Network (DQN) → Extension of Q-Learning using deep neural networks.
- Proximal Policy Optimization (PPO) → Popular policy-gradient method, used in robotics and games.
Example: Robot learning to walk by trial and error.
MCQs
- Supervised learning requires:
a) Only input data
b) Input-output pairs
c) Rewards and penalties
d) No data - Predicting exam marks from study hours is:
a) Regression
b) Clustering
c) Reinforcement
d) Association - Clustering belongs to:
a) Supervised learning
b) Unsupervised learning
c) Reinforcement learning
d) None of these - Which algorithm is commonly used for clustering?
a) Linear Regression
b) K-Means
c) Logistic Regression
d) SVM - In reinforcement learning, the agent learns through:
a) Supervised labels
b) Random guesses
c) Rewards & penalties
d) Manual programming - A self-driving car is best trained with:
a) Supervised learning
b) Reinforcement learning
c) Unsupervised learning
d) NoneAnswers: 1 → b, 2 → a, 3 → b, 4 → b, 5 → c, 6 → b
ML and Data Preprocessing
1. Applications and Case Studies ML:
Machine Learning (ML) is widely applied across industries to automate tasks, make predictions, and generate insights.
Real-Life Applications:
Domain | Application Example |
Healthcare | Predicting patient readmissions, cancer detection using image classification |
Finance | Credit scoring, fraud detection in transactions |
Retail | Product recommendation systems (e.g., Amazon, Flipkart) |
Transportation | Self-driving cars using object detection, route optimization |
Agriculture | Predicting crop yields using satellite and sensor data |
Entertainment | Personalized content recommendations (e.g., Netflix, YouTube) |
Manufacturing | Predictive maintenance, quality control automation |
Case Study:
Netflix Recommendation System: Netflix uses ML models (collaborative filtering + deep learning) to recommend shows and movies based on user history, ratings, clicks and global trends.
MCQs:
- Which of the following is NOT a typical application of machine learning?
a) Credit scoring
b) Weather forecasting
c) Book printing
d) Face recognition
Answer: c) Book printing - Which ML technique is most used for detecting spam emails?
a) Regression
b) Clustering
c) Classification
d) Reinforcement Learning
Answer: c) Classification - In Netflix’s recommendation engine, which ML concept is majorly used?
a) Unsupervised learning
b) Dimensionality reduction
c) Collaborative filtering
d) K-means clustering
Answer: c) Collaborative filtering - Which industry heavily relies on ML for predictive maintenance?
a) Education
b) Manufacturing
c) Entertainment
d) Retail
Answer: b) Manufacturing - What kind of learning is used in self-driving cars?
a) Reinforcement Learning
b) Clustering
c) Logistic Regression
d) PCA
Answer: a) Reinforcement Learning
Practice Question:
Research an ML use-case in your local area (e.g., smart city, health monitoring, traffic prediction) and write 100 words describing how ML is applied and what data is used.
2. Importance of Data Preprocessing:
Data pre-processing transforms raw data into a clean and usable format before applying ML algorithms.
Key Steps:
- Handling missing values
- Removing duplicates
- Encoding categorical data
- Scaling or normalizing
- Splitting data (train/test)
Real-Life Example:
In a loan approval dataset, applicants might leave some fields blank (e.g., income or employment). Before training the ML model, these missing fields must be imputed or removed.
MCQs:
- What is the main purpose of data pre-processing?
a) To make predictions
b) To clean and prepare data for ML models
c) To store the data
d) To visualize the data
Answer: b) To clean and prepare data for ML models - Which of the following is NOT part of data pre-processing?
a) Feature scaling
b) Model evaluation
c) Handling missing data
d) Encoding categorical variables
Answer: b) Model evaluation - Why is data splitting important?
a) To reduce dataset size
b) To prevent model overfitting
c) To improve visualization
d) To store data
Answer: b) To prevent model overfitting - Which of the following may require one-hot encoding?
a) Numerical features
b) Text columns
c) Categorical variables
d) Continuous variables
Answer: c) Categorical variables
Practice Question:
Given a dataset with missing values and categorical features, describe the steps you’d take to pre-process it for a machine learning task.
3. Data Cleaning, Normalization, and Transformation:
✅ Data Cleaning:
- Remove duplicates
- Handle missing data (mean, median, mode imputation)
- Fix inconsistent formatting
✅ Normalization:
- Brings all features to the same scale.
- Useful for distance-based algorithms (e.g., KNN, SVM).
- Common methods: Min-Max Scaling, Z-score Standardization
✅ Transformation:
- Converting skewed data (e.g., using log/sqrt transformation)
- Encoding categorical variables
Real-Life Example:
In a customer churn prediction dataset:
- Clean: remove duplicate customer records.
- Normalize: scale income and age for fair comparison.
- Transform: convert “Yes/No” churn column to 1/0.
MCQs:
- Which technique brings all values into the 0–1 range?
a) Log transformation
b) Z-score normalization
c) Min-max normalization
d) One-hot encoding
Answer: c) Min-max normalization - Which function is used to reduce skewness in data?
a) Dropna
b) Log transformation
c) Mean imputation
d) Label encoding
Answer: b) Log transformation - Why do we normalize features?
a) To remove null values
b) To ensure uniform scaling
c) To improve dataset size
d) To clean categorical data
Answer: b) To ensure uniform scaling - Which pandas method removes duplicate rows?
a) dropna()
b) drop_duplicates()
c) fillna()
d) remove()
Answer: b) drop_duplicates()
Practice Question:
Download any open dataset (e.g., from Kaggle or UCI), and apply normalization and log transformation. Observe the effect using histograms.
4. Feature Selection and Dimensionality Reduction:
✅ Feature Selection:
Choosing only the most relevant features to improve model performance and reduce complexity.
Methods:
- Filter (e.g., correlation)
- Wrapper (e.g., RFE)
- Embedded (e.g., Lasso)
✅ Dimensionality Reduction:
Reducing the number of input variables using transformation techniques.
Methods:
- PCA (Principal Component Analysis)
- t-SNE (for visualization)
- LDA (Linear Discriminant Analysis)
Real-Life Example:
In image classification (e.g., digits), images have thousands of pixels. PCA can reduce dimensionality while keeping most variance.
MCQs:
- Which of these is a dimensionality reduction technique?
a) SVM
b) PCA
c) Logistic Regression
d) Random Forest
Answer: b) PCA - What is the goal of feature selection?
a) Add more features
b) Remove noisy or redundant features
c) Increase dimensionality
d) Improve data visualization
Answer: b) Remove noisy or redundant features - Which algorithm can be used for feature ranking?
a) RFE
b) K-means
c) KNN
d) DBSCAN
Answer: a) RFE - Which of the following reduces the number of correlated variables?
a) Encoding
b) PCA
c) Normalization
d) Imputation
Answer: b) PCA - Which is not a method of feature selection?
a) Filter method
b) Wrapper method
c) Log transformation
d) Embedded method
Answer: c) Log transformation
Practice Question:
Apply PCA on the Iris dataset and reduce features from 4 to 2. Visualize the result using a scatter plot and comment on the separability of classes.