AI-900
Module : 1 Introduction to AI Concepts
LLM (Large Language Models) :
Takes text as input and produce text as output.
It is trained on massive amounts of text data from books, websites, and other sources to learn grammar, facts, reasoning, and context.
Works using Tokenization.
SLM (Small Language Models) :
A Small Language Model works on the same concept as an LLM but with fewer parameters and a smaller training dataset.
While it’s less powerful, it’s faster, more cost-efficient, and can run on local devices like laptops or smartphones.
Computer Vision :
What is Computer Vision in AI?
Computer vision is a field of artificial intelligence that enables machines to see, understand, and interpret visual information from the world around them, just like humans do. It works by training AI models on large sets of labeled images. This training helps the system recognize patterns, objects, and scenes automatically.
Through computer vision, machines can identify what an image contains. They can also locate, classify, and analyze the objects within it.
How Computer Vision Works
Image Classification: The model learns to recognize the main subject in an image (e.g., “cat,” “car,” or “tree”).
Object Detection: The AI detects specific objects within an image and marks their positions.
Semantic Segmentation: This technique identifies each pixel belonging to a specific object, providing detailed visual understanding.
Multi-Modal Models: By combining computer vision with language models (like GPT), AI systems can analyze images and describe them in natural language. This bridges the gap between vision and text understanding.
Real-World Applications
Computer vision powers many technologies we use every day, including:
- Auto-captioning and tag generation for photos
- Visual search tools
- Retail automation (stock monitoring, item recognition)
- Security and surveillance systems
- Facial recognition for authentication
- Robotics and self-driving vehicles
In short:
Computer vision allows AI to make sense of the visual world by identifying, describing, and interacting with what it “sees.” This forms the foundation of modern visual intelligence systems.
What is Speech Technology in AI?
Speech Technology is a part of artificial intelligence that helps machines hear, understand, and generate human speech. It connects human language with computer systems, allowing people to interact with technology in a natural way—simply by speaking.
There are two main components that make this possible:
Key Components of AI Speech
Speech Recognition (Speech-to-Text):
This lets AI systems listen to and interpret spoken words by turning them into written text. It’s how digital assistants understand voice commands and how meetings are automatically transcribed.
Speech Synthesis (Text-to-Speech):
This allows AI to respond by converting written text into realistic spoken words, often using natural-sounding voices.
AI speech systems are becoming more capable. They can filter background noise, recognize accents, detect interruptions, and produce tones and emotions that sound more human-like.
Real-World Applications
AI speech technology is used in many everyday tools and services, such as:
Voice assistants on smartphones, computers, and smart home devices
Automated transcription for calls, lectures, or meetings
Audio descriptions for videos or written content
Real-time language translation through speech
In Short
AI speech technology gives computers a voice and ears, enabling natural, spoken interaction between humans and machines.
What is Natural Language Processing (NLP) in AI?
Natural Language Processing (NLP) is an important area of artificial intelligence that allows computers to understand, interpret, and generate human language in a meaningful way. It helps machines make sense of text, whether that’s analyzing a document, classifying information, or responding in a conversation.
While many modern NLP tasks rely on Generative AI models, simpler and more focused NLP models are still cost-effective and efficient for common text analytics use cases.
Key NLP Tasks
- Entity Extraction: Identifies key elements like people, places, organizations, or products mentioned in text.
- Text Classification: Categorizes documents or messages into predefined topics or groups.
- Sentiment Analysis: Determines the emotional tone of text, whether it is positive, negative, or neutral.
- Language Detection: Recognizes which language a text is written in.
Note: NLP is often called Natural Language Understanding (NLU) when the focus is on extracting meaning and intent from human language.
Real-World Applications
NLP plays a vital role in many industries and services, such as:
- Analyzing documents, transcripts, or customer interactions to extract useful information
- Monitoring social media, reviews, and feedback to gauge sentiment and public opinion
- Powering chatbots and virtual assistants for automated customer support
- Classifying and organizing large volumes of unstructured text data
In Short
Natural Language Processing gives AI the ability to read and understand language. It transforms written or spoken words into actionable insights and intelligent responses.
Using AI to Extract Data and Insights
AI-powered data and insight extraction uses artificial intelligence to automatically identify, read, and interpret information from various sources, such as documents, images, or even audio and video. This technology transforms unstructured data into useful insights that can guide decisions and automate workflows.
Many document analysis solutions rely on Optical Character Recognition (OCR), a type of computer vision that allows AI to detect and extract text from images or scanned documents.
How It Works
Optical Character Recognition (OCR): Detects and converts text from images or scanned pages into editable, machine-readable text.
Field Extraction: Advanced models do more than just recognize text; they identify and pull specific fields or values, such as names, dates, or totals, from forms and structured documents.
Multi-Modal Extraction: Modern AI systems can extract insights not only from text but also from audio recordings, images, and videos, which broadens the range of information analysis.
Real-World Applications
AI-driven data and insight extraction is widely used in many industries for:
- Automating document and form processing, such as invoices and expense claims.
- Large-scale digitization of paper records, including census data or archives.
- Indexing and organizing documents for quick and efficient searches.
- Extracting action points or summaries from meeting transcripts or recordings.
In Short
AI data extraction converts unstructured content from documents, images, or audio into structured, actionable insights that improve efficiency and decision-making.
Module : 2 Inroduction to Machine Learning Concepts
Introduction to Machine Learning
Machine Learning (ML) combines two powerful fields, data science and software engineering. Its main goal is to use data to build models that can predict outcomes. These models can be integrated into applications or services. They allow systems to learn from experience and make smart predictions or decisions without explicit programming.
Machine learning relies on the skills of:
Data scientists, who explore, clean, and prepare data. They also train and evaluate models.
Software developers, who deploy these models into real-world applications to make predictions. This process is known as inferencing.
The Core Idea
At its core, machine learning comes from statistics and mathematical modeling. It is based on learning patterns from past observations to predict unknown outcomes.
Here are a few practical examples:
Retail: An ice cream shop owner might use past sales and weather data to predict how many ice creams will sell on a given day based on the forecast.
Healthcare: A doctor could analyze past clinical data to see if a patient is at risk of diabetes by looking at factors like weight and blood glucose levels.
Research: A scientist studying penguins in Antarctica might train a model to identify species, such as Adelie, Gentoo, or Chinstrap, based on measurements of their flippers, bills, and other features.
Understanding Training and Inferencing in Machine Learning
Machine learning (ML) relies heavily on math and statistics. ML models are often described using mathematical terms. At its base, a machine learning model is a function that calculates a value (a prediction) based on one or more inputs (features).
The process of defining this function with existing data is called training. Once a model is trained, using it to predict new values is known as inferencing.
The Two Phases of Machine Learning
1. Training Phase
In training, we use historical data, known as training data, which includes:
Features (x): The attributes or characteristics used for prediction.
Label (y): The known value or outcome we want the model to learn to predict.
In mathematical notation, a single observation often looks like this:
x = [x₁, x₂, x₃, …] → y
The goal of training is to find a function f(x) that best maps input features to output labels:
y = f(x)
Let’s look at some examples to clarify this:
Ice Cream Sales:
Features (x): Weather data, including temperature, rainfall, and windspeed.
Label (y): The number of ice creams sold.
Medical Diagnosis:
Features (x): Patient metrics, such as weight and glucose level.
Label (y): The risk of diabetes (1 = at risk, 0 = not at risk).
Penguin Classification:
Features (x): Physical attributes, including flipper length and bill width.
Label (y): Penguin species (0 = Adelie, 1 = Gentoo, 2 = Chinstrap).
During training, we apply an algorithm to this data to find relationships between x (features) and y (labels). The algorithm's job is to find a general mathematical function f that best fits the data, allowing it to predict new outcomes accurately.
2. Inferencing Phase
Once the model is trained, it becomes a predictive tool. You can input new feature values (x), and the model will provide a predicted label (Ε·, pronounced “y-hat”).
For example:
If you input today’s temperature, humidity, and windspeed, the trained model can predict how many ice creams are likely to be sold.
This phase shows the practical use of machine learning, where models turn past data into useful predictions and insights.
In Short
Training teaches the machine how to predict. Inferencing shows where those predictions come to life, turning data-driven models into intelligent, decision-making systems.
Types of Machine Learning
There are several types of machine learning (ML), and the type you choose depends on your prediction task. Broadly, machine learning can be split into supervised and unsupervised methods.
1. Supervised Machine Learning
Supervised ML uses training data with both features (x) and known labels (y). The aim is to learn a function that links features to labels, enabling the model to predict unknown outcomes for new data.
a. Regression
Regression predicts numeric values. Examples include:
- Ice cream sales based on temperature, rainfall, and windspeed
- Property price based on size, number of bedrooms, and location metrics
- Car fuel efficiency based on engine size, weight, and dimensions
b. Classification
Classification predicts categorical labels.
Binary Classification predicts one of two outcomes:
- Whether a patient is at risk of diabetes (yes/no)
- Whether a customer will default on a loan (true/false)
- Whether a marketing email will be opened (positive/negative)
Multiclass Classification predicts one label from multiple classes:
- Penguin species (Adelie, Gentoo, Chinstrap)
- Movie genre (comedy, horror, romance, adventure, science fiction)
Multilabel Classification allows multiple labels for one observation:
- A movie could be both science fiction and comedy
2. Unsupervised Machine Learning
Unsupervised ML works with data that only has features, without any known labels. Models identify patterns and relationships among observations.
Clustering
Clustering groups similar observations into clusters based on feature similarity:
- Group flowers by size, number of leaves, and petals
- Segment customers by demographic and purchasing behavior
The key difference from classification is:
In classification, classes are known, and labels exist in the training data.
In clustering, the algorithm finds groups without pre-existing labels.
Use case: Clustering can define classes for later classification. For example, clustering customers can help identify high-value, frequent-small-purchase, or occasional-large-purchase segments. These categories can then train a supervised classification model.
In Short
Supervised learning predicts outcomes based on labeled data. Unsupervised learning uncovers hidden patterns and groups in unlabeled data. Choosing the right approach depends on whether your data has known outcomes.
Understanding Regression in Machine Learning
Regression is a type of supervised machine learning used to predict numeric values based on input features. These models learn from historical data that includes both features (x) and labels (y) to make predictions for new data.
How Regression Works
The training process for a regression model typically involves:
Splitting the Data: Randomly divide your dataset into a training set for training the model and a validation set for evaluating the model.
Training the Model: Apply a regression algorithm, such as linear regression, to the training data to fit a function that connects features to labels.
Validation: Use the held-back data to make predictions and compare predicted labels (Ε·) with actual labels (y).
Evaluation and Iteration: Assess the model using metrics, change algorithm parameters, or try different algorithms until the predictive accuracy meets expectations.
Example: Predicting Ice Cream Sales
Suppose we want to predict daily ice cream sales based on temperature:
Temperature (x) Ice Cream Sales (y)
51 1
65 14
69 20
72 23
75 26
81 30
We can plot these points on a scatter plot and apply linear regression to fit a straight line:
f(x) = x - 50
Using this function, if the forecasted temperature tomorrow is 77°F, the model predicts:
Ε· = 77 - 50 = 27 ice creams
Evaluating a Regression Model
To measure the model's accuracy, we use the validation dataset and compare predicted values (Ε·) with actual values (y). Common metrics include:
Mean Absolute Error (MAE): Average of absolute differences between predicted and actual values.
Example: MAE = 2.33 ice creams
Mean Squared Error (MSE): Average of squared differences to highlight larger errors.
Example: MSE = 6
Root Mean Squared Error (RMSE): Square root of MSE, showing error in original units.
Example: RMSE = 2.45 ice creams
R² (Coefficient of Determination): Proportion of variance in actual values explained by the model.
Example: R² = 0.95 (closer to 1 means a better fit)
Iterative Training
Regression models are rarely perfect on the first try. Data scientists often repeat the process by:
Selecting and transforming features,
Choosing different algorithms like linear regression or polynomial regression,
Tuning hyperparameters, which are specific numeric settings for the algorithm.
After multiple repetitions, the best-performing model is chosen for deployment.
In Short
Regression models predict numeric outcomes by learning patterns in historical data. Evaluation metrics like MAE, RMSE, and R² help ensure predictions are accurate and reliable for real-world situations.
Binary Classification in Machine Learning
Binary classification is a type of supervised learning in which the model predicts one of two possible outcomes: true/false or yes/no. Like regression, it involves an iterative process of training, validating, and evaluating a model.
Instead of predicting numeric values, binary classification models calculate probabilities for class assignment and use metrics that compare predicted labels to actual labels.
Example: Predicting Diabetes
Suppose we want to predict whether a patient has diabetes based on blood glucose levels:
Blood Glucose (x) Diabetic? (y)
67 0
103 1
114 1
72 0
116 1
65 0
We can train a logistic regression model that creates a sigmoid (S-shaped) function representing the probability that a patient has diabetes:
π(π₯) = π(π¦=1∣π₯)
A threshold, usually set at 0.5, determines the predicted class:
Probability ≥ 0.5 → predict 1 (diabetic)
Probability < 0.5 → predict 0 (non-diabetic)
For example, a patient with a blood glucose level of 90 may have P(y=1) = 0.9, so they are predicted as diabetic.
Evaluating a Binary Classification Model
We use a validation dataset to test the model. Predicted labels (Ε·) are compared to actual labels (y) using a confusion matrix:
Predicted 0 Predicted 1
Actual 0 TN FP
Actual 1 FN TP
Where:
TN: True Negatives
FP: False Positives
FN: False Negatives
TP: True Positives
Understanding TN, FP, FN, and TP in Classification
When we evaluate a classification model, we compare the predicted labels (Ε·) with the actual labels (y). This comparison is shown in a confusion matrix, which helps us see how the model is doing.
TP (True Positives): These are cases where the model correctly predicts the positive class.
Example: A patient has diabetes, and the model predicts diabetic → ✅ TP
TN (True Negatives): These are cases where the model correctly predicts the negative class.
Example: A patient does not have diabetes, and the model predicts non-diabetic → ✅ TN
FP (False Positives): These are cases where the model predicts positive, but the actual class is negative.
Example: A patient does not have diabetes, but the model predicts diabetic → ❌ FP
This is also called a “Type I error.”
FN (False Negatives): These are cases where the model predicts negative, but the actual class is positive.
Example: A patient has diabetes, but the model predicts non-diabetic → ❌ FN
This is also called a “Type II error.”
Key Metrics
1. Accuracy – Proportion of correct predictions:
Accuracy = ππ + ππ / (ππ + ππ + πΉπ + πΉπ)
Example: Accuracy = 0.83 → 83% correct predictions
2. Recall (Sensitivity) – Proportion of actual positives correctly identified:
Recall = ππ / (ππ + πΉπ)
Example: Recall = 0.75 → 75% of diabetic patients correctly identified
3. Precision – Proportion of predicted positives that are actually positive:
Precision = ππ / (ππ + πΉπ)
Example: Precision = 1.0 → 100% of predicted diabetic patients were correct
4. F1-Score – Harmonic mean of precision and recall:
F1 = 2 × Precision × Recall / (Precision + Recall)
Example: F1 = 0.86
5. Area Under the Curve (AUC) – Evaluates model performance across thresholds using the ROC curve.
AUC = 1 → perfect model
AUC = 0.5 → random guessing
Example: AUC = 0.875 → the model performs much better than random guessing
In Short
Binary classification predicts one of two outcomes based on feature values. Evaluation metrics like accuracy, precision, recall, F1-score, and AUC ensure that the model reliably identifies both positive and negative cases.
Multiclass Classification in Machine Learning
Multiclass classification is a supervised learning method used to predict which of several possible classes an observation belongs to. Like regression and binary classification, it follows the train, validate, evaluate process.
Example: Classifying Penguin Species
Suppose we have penguins and measure their flipper length (x). The species (y) are encoded as:
0 → Adelie
1 → Gentoo
2 → Chinstrap
| Flipper Length (x) | Species (y) |
|---------------------|-------------|
| 167 | 0 |
| 172 | 0 |
| 225 | 2 |
| 197 | 1 |
| 189 | 1 |
| 232 | 2 |
| 158 | 0 |
Algorithms for Multiclass Classification
1. One-vs-Rest (OvR)
This approach trains one binary classifier for each class. Each function calculates the probability that an observation belongs to its class compared to all other classes.
Example for penguins:
\( f_0(x) = P(y=0|x), f_1(x) = P(y=1|x), f_2(x) = P(y=2|x) \)
The predicted class is the one with the highest probability.
2. Multinomial Algorithms
A single function produces a probability vector for all classes, which sums to 1.
Example (Softmax output):
\( f(x) = [0.2, 0.3, 0.5] \)
Predicted class = 2
Evaluating a Multiclass Classifier
Evaluation uses a confusion matrix to show counts of predicted versus actual class labels. Metrics can be calculated for each class or overall.
| Flipper Length (x) | Actual Species (y) | Predicted (Ε·) |
|---------------------|---------------------|----------------|
| 165 | 0 | 0 |
| 171 | 0 | 0 |
| 205 | 2 | 1 |
| 195 | 1 | 1 |
| 183 | 1 | 1 |
| 221 | 2 | 2 |
| 214 | 2 | 2 |
Per-Class Metrics Example:
| Class| TP| TN| FP | FN | Accuracy | Recall | Precision | F1-Score |
|------|----|--- |---|----|---------|-------|---------|--------|
| 0 | 2 | 5 | 0 | 0 | 1.0 | 1.0 | 1.0 | 1.0 |
| 1 | 2 | 4 | 1 | 0 | 0.86 | 1.0 | 0.67 | 0.8 |
| 2 | 2 | 4 | 0 | 1 | 0.86 | 0.67 | 1.0 | 0.8 |
Overall Metrics:
Accuracy = 0.90, Recall = 0.86, Precision = 0.86, F1-score = 0.86
Key Takeaways
Multiclass classification predicts among more than two classes. Algorithms include One-vs-Rest and multinomial (softmax). Evaluation uses confusion matrices and metrics like accuracy, precision, recall, and F1-score. The predicted class is the one with the highest probability.
Clustering in Machine Learning
Clustering is a type of unsupervised machine learning where observations are grouped into clusters based on similarities in their features. Unlike supervised learning, this method does not require pre-labeled data; the model identifies patterns solely from the input features.
Example: Grouping Flowers
A botanist records the number of leaves (x1) and the number of petals (x2) for flowers:
Leaves (x1) Petals (x2)
0 5
0 6
1 3
1 3
1 6
1 8
2 3
2 7
2 8
The goal is to group similar flowers together based on their features without knowing their species.
Training a Clustering Model
K-Means Clustering, the most common algorithm, works as follows:
Vectorize features: Represent each observation as an n-dimensional vector. For flowers, this means [leaves, petals].
Choose k clusters: Decide how many clusters (k) you want. Plot k random points as centroids.
Assign points: Each data point is assigned to the nearest centroid.
Move centroids: Update centroid positions to the average location of the assigned points.
Reassign points: Points may now be closer to a different centroid, so reassign them accordingly.
Iterate: Repeat the process of moving centroids and reassigning points until the clusters stabilize or the maximum number of iterations is reached.
Evaluating a Clustering Model
Since there are no true labels, evaluation focuses on cluster separation:
Average distance to cluster center: This measures how close points are to their centroid.
Average distance to other centers: This measures how far points are from the centroids of other clusters.
Maximum distance to cluster center: This indicates the furthest point from its cluster centroid.
Silhouette Score: This score ranges from -1 to 1. Higher values indicate better separation between clusters.
Key Takeaways
Clustering is unsupervised and groups observations based on feature similarity.
K-Means is a popular algorithm that updates centroids and point assignments iteratively.
Evaluation focuses on distance metrics and the silhouette score, rather than on accuracy.
Clustering is useful for customer segmentation, document grouping, and discovering patterns.
Deep Learning in Machine Learning
Deep learning is a complex type of machine learning that mimics how the human brain learns. It employs artificial neural networks (ANNs), which are mathematical models inspired by biological neurons, to process input data and make predictions.
Biological neural network Artificial neural network
Biological Neural Network: Neurons fire in response to stimuli and pass signals to connected neurons.
Artificial Neural Network: Each neuron applies a function to input \( x \) and a weight \( w \), passing the result through an activation function.
Multiple layers of neurons create a deep neural network (DNN).
Deep learning models can tackle regression, classification, natural language processing, computer vision, and more.
How Deep Learning Works
Input features (\( x \)): Each observation is represented as a vector of features.
Example: For a penguin classification model:
\[ x = [bill \, length, bill \, depth, flipper \, length, weight] \]
Neural network layers: Each layer applies weights and activation functions to the inputs and sends outputs to the next layer.
Output layer: Produces predictions as probabilities for each class (using softmax for classification). Example:
\[ y = [P(Adelie), P(Gentoo), P(Chinstrap)] = [0.2, 0.7, 0.1] \]
The class with the highest probability is the predicted label.
How Neural Networks Learn
Weights (\( w \)) decide how inputs are changed in the network.
Training process:
Feed training features through the network to get predicted outputs \( \hat{y} \).
Compare \( \hat{y} \) to the actual labels \( y \) using a loss function.
Use optimization methods (like gradient descent) to modify weights to reduce loss.
Backpropagate changes through the network and repeat for multiple epochs until predictions are correct.
Batch processing: Data is often processed in matrices to speed up training, especially on GPUs designed for linear algebra.
Example: Penguin Species Classification
Input vector: [37.3, 16.8, 19.2, 30.0]
Output probabilities: [0.2, 0.7, 0.1] → Predicted species: Gentoo
The network repeatedly adjusts weights to better predictions over several training iterations.
Key Takeaways
Deep learning uses multi-layer neural networks to model complex relationships in data.
It works for classification, regression, and specific tasks like NLP and image recognition.
Training includes forward pass, loss calculation, backpropagation, and weight optimization.
GPUs speed up training due to matrix-based computations.
Comments
Post a Comment