Simulation of human intelligence in machines that are programmed to think, learn, and make decisions autonomously
These systems are designed to mimic human cognitive functions like problem-solving, language understanding, learning from experience, and pattern recognition
Branch of computer science dealing with the simulation of intelligent behavior in computers
Study of programs that are not explicitly programmed, but instead these algorithms learn patterns from data
Application of AI that provides systems the ability to learn on their own and improve from experiences without being programmed externally
\[ Total Error = Bias^2 + Variance + Irreducible Error\]
Bias\[^2\] : error from incorrect model assumptions
Variance : error from sensitivity to small fluctuations in the training data
Irreducible Error : random noise in the data that cannot be eliminated
from sklearn.model_selection import train_test_split
# Define the feature(s) and target variable
y_col = "price" # define y column
X = data_df.drop(y_col, axis=1) # drop y column from features data
y = data_df.y_col
# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("number of test samples :", X_test.shape[0])
print("number of training samples:",X_train.shape[0])
model_selection.train_test_split
- split arrays or
matrices into random train and test subsets
X, y
: the allowed inputs are lists, numpy arrays, scipy-sparse matrices or
pandas data framestest_size
: if float, it should be between 0.0 and 1.0 and represents the
proportion of the dataset to include in the test split. If int (integer), it represents
the absolute number of test samples. If None, the value is set to the complement of the
train size. If train_size is also None, it will be set to 0.25random_state
: controls the shuffling applied to the data before applying
the split. Pass an int for reproducible output across multiple function calls\[ \text{boxcox}(y_i) = \frac{y_i^{\lambda} - 1}{\lambda}\]
scipy.stats.boxcox
- return a dataset transformed by a Box-Cox power transformation
import matplotlib.pyplot as plt
from scipy.stats.mstats import normaltest # D'Agostino K^2 Test
normaltest(data_df.target.values)
log_target = np.log(data_df.target)
log_target.hist();
sqrt_target = np.sqrt(data_df.target)
sqrt_target.hist();
from scipy.stats import boxcox
bc_result = boxcox(data_df.target)
boxcox_target = bc_result[0]
lam = bc_result[1]
plt.hist(boxcox_target);
Statistical method used to model the relationship between a dependent (target or outcome) variable and one or more independent (predictors or features) variables
Predicts the continuous output variables based on the independent input variable
Goal is to understand how changes in the independent variables are associated with changes in the dependent variable and to make predictions about the dependent variable based on known values of the independent variables
Statistical method used in machine learning and data science to assess the performance of a model by splitting the data into multiple subsets (folds) and training/testing the model on different combinations of these subsets.
The purpose is to provide a more accurate and robust estimate of the model’s performance on unseen data, helping to prevent overfitting and improve generalization
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
# Sample data
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
# Model
model = LinearRegression()
# KFold Cross-Validation
kfold = KFold(n_splits=5)
results = cross_val_score(model, X, y, cv=kfold, scoring='neg_mean_squared_error')
# Average MSE across folds
average_mse = -results.mean()
print("Average MSE:", average_mse)
model_selection.KFold
K-Fold cross-validator creates number of
k-fold splits,
allowing cross validation
n_splits
- number of folds, must be at least 2model_selection.cross_val_score
evaluates model's score through
cross
validation
estimator
- the object to use to fit the dataX, y
- the data to fit, the target variable to try to predict in the case of
supervised learningcv
- determines the cross-validation splitting strategy (None, int, CV splitter)
scoring
- a str or a scorer callable object / function with signature
scorer (estimator, X, y) which should return only a single valuemodel_selection.cross_val_predict
produces the out-of-bag
prediction for each
row
model_selection.GridSearchCV
scans over parameters to select
the best
hyperparameter set with the best out-of-sample score
Technique used to prevent overfitting by penalizing high-valued coefficients (reduces parameters and shrinks the model)
Specially useful in complex models like high-degree polynomial regression, deep neural networks, and decision trees, where the risk of overfitting is high
Modifies the cost function used to train the model by adding a penalty term, typically multiplied by a hyperparameter λ (regularization strength)
Regularization techniques have an analytical, a geometric, and a probabilistic interpretation
\[ \text{L2 penalty} = \lambda \sum_{i=1}^{n} w_i^2 \]
\[ \text{L1 penalty} = \lambda \sum_{i=1}^{n} |w_i| \]
\[ \text{Elastic Net penalty} = \lambda_1 \sum_{i=1}^{n} |w_i| + \lambda_2 \sum_{i=1}^{n} w_i^2 \]
\[ MAE = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right| \]
\[n: \text{Total number of observations}\]
\[y_i: \text{Actual value for the}\] \[i^{th} \text{ data point}\]
\[\hat{y}_i: \text{Predicted value for the}\] \[i^{th} \text{ data point}\]
\[\left| y_i - \hat{y}_i \right|: \text{Absolute error for the}\] \[i^{th} \text{ prediction}\]
Actual (y) | Predicted (y_hat) | Error (y-y_hat) | Absolute Error |
---|---|---|---|
3.0 | 2.5 | 0.5 | 0.5 |
5.0 | 4.5 | 0.5 | 0.5 |
2.0 | 2.5 | -0.5 | 0.5 |
7.0 | 6.0 | 1.0 | 1.0 |
\[ MSE = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2 \]
\[n: \text{Total number of observations}\]
\[y_i: \text{Actual value for the}\] \[i^{th} \text{ data point}\]
\[\hat{y}_i: \text{Predicted value for the}\] \[i^{th} \text{ data point}\]
\[\left( y_i - \hat{y}_i \right)^2: \text{Squared error for the}\] \[i^{th} \text{ prediction}\]
Actual (y) | Predicted (y_hat) | Error (y-y_hat) | Squared Error |
---|---|---|---|
3.0 | 2.5 | 0.5 | 0.25 |
5.0 | 4.5 | 0.5 | 0.25 |
2.0 | 2.5 | -0.5 | 0.25 |
7.0 | 6.0 | 1.0 | 1.00 |
from sklearn.metrics import mean_squared_error
import numpy as np
# Example usage
y_true = np.array([3.0, -0.5, 2.0, 7.0])
y_pred = np.array([2.5, 0.0, 2.1, 7.8])
# Calculate MSE with sklearn
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
# Define function for MSE
def mse(y_true, y_pred):
return np.mean((y_true - y_pred) ** 2)
# Calculate MSE
error = mse(y_true, y_pred)
print("Mean Squared Error:", error)
metrics.mean_squared_error(y_true, y_pred)
- mean
squared error
regression loss
\[ RMSE = \sqrt{ \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2 } \]
\[n: \text{Total number of observations}\]
\[y_i: \text{Actual value for the}\] \[i^{th} \text{ data point}\]
\[\hat{y}_i: \text{Predicted value for the}\] \[i^{th} \text{ data point}\]
Actual (y) | Predicted (y_hat) | Error (y-y_hat) | Squared Error |
---|---|---|---|
3.0 | 2.5 | 0.5 | 0.25 |
5.0 | 4.5 | 0.5 | 0.25 |
2.0 | 2.5 | -0.5 | 0.25 |
7.0 | 6.0 | 1.0 | 1.00 |
import numpy as np
from sklearn.metrics import mean_squared_error
# Example usage
y_true = np.array([3.0, -0.5, 2.0, 7.0])
y_pred = np.array([2.5, 0.0, 2.1, 7.8])
# Calculate RMSE with sklearn
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print("Root Mean Squared Error:", rmse)
# Define function for RMSE
def rmse(y_true, y_pred):
return np.sqrt(np.mean((y_true - y_pred) ** 2))
# Calculate RMSE
error = rmse(y_true, y_pred)
print("Root Mean Squared Error:", error)
\[ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} \]
\(SS_{res} = \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2\) : Residual Sum of Squares (the sum of squared errors between actual and predicted values).
\(SS_{tot} = \sum_{i=1}^{n} \left( y_i - \bar{y} \right)^2\) : Total Sum of Squares (the total variance of the actual values from their mean).
\[y_i: \text{Actual value for the}\] \[i^{th} \text{ data point}\]
\[\hat{y}_i: \text{Predicted value for the}\] \[i^{th} \text{ data point}\]
\[\bar{y}: \text{Mean of the actual values}\]
Actual (y) | Predicted (y_hat) | Mean of Actual () |
---|---|---|
3.0 | 2.5 | 4.25 |
5.0 | 4.5 | 4.25 |
2.0 | 2.5 | 4.25 |
7.0 | 6.0 | 4.25 |
import numpy as np
from sklearn.metrics import r2_score
# Example true and predicted values
y_true = [3.0, -0.5, 2.0, 7.0]
y_pred = [2.5, 0.0, 2.1, 7.8]
# Calculate R-squared
r_squared = r2_score(y_true, y_pred)
print("R-squared:", r_squared)
# Convert to numpy arrays
y_true = np.array(y_true)
y_pred = np.array(y_pred)
# Calculate the mean of the true values
y_mean = np.mean(y_true)
# Calculate the total sum of squares (TSS) and residual sum of squares (RSS)
ss_total = np.sum((y_true - y_mean) ** 2)
ss_residual = np.sum((y_true - y_pred) ** 2)
# Calculate R-squared
r_squared = 1 - (ss_residual / ss_total)
print("R-squared:", r_squared)
metrics.r2_score(y_true, y_pred)
- R2 (coefficient of
determination) regression score function (best possible score is 1.0)
\[ R^2_{adj} = 1 - \left( \frac{SS_{res} / (n - k - 1)}{SS_{tot} / (n - 1)} \right) \]
\(SS_{res} = \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2\) : Residual Sum of Squares.
\(SS_{tot} = \sum_{i=1}^{n} \left( y_i - \bar{y} \right)^2\) : Total Sum of Squares.
\[n: \text{Total number of observations}\]
\[k: \text{Total number of predictors}\]
import numpy as np
from sklearn.metrics import r2_score
# Example true and predicted values
y_true = [3.0, -0.5, 2.0, 7.0]
y_pred = [2.5, 0.0, 2.1, 7.8]
# Calculate R-squared
r_squared = r2_score(y_true, y_pred)
print("R-squared:", r_squared)
# Number of observations and predictors
n = len(y_true) # number of data points
p = 1 # number of predictors (set to 1 for simplicity, adjust for more features)
# Calculate Adjusted R-squared
r_squared_adjusted = 1 - ((1 - r_squared) * (n - 1) / (n - p - 1))
print("Adjusted R-squared:", r_squared_adjusted)
Model the relationship between a dependent (response/target) continuous variable and one or more independent (predictors/features) variables by fitting a linear equation to observed data
Simple Linear Regression involves a single independent variable (relationship between this variable and the target is represented as a straight line)
Multiple Linear Regression extends simple linear regression by including multiple predictors
Used when relationship between the dependent and independent variables is linear, dataset is small to medium-sized and does not have too many complex features, outliers are minimal and the assumptions of linearity and homoscedasticity are reasonable
\( y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon \)
\[y\] : dependent (response/target) variable we aim to predict (changes when there is any change in the values of the independent variables)
\[x_i\] : independent (predictors/features) variables (does not change based on the effects of other variables)
\[\beta_0\] : y-intercept (the value of y when all \[x_i=0\])
\[\beta_i\] : coefficients corresponding to each independent variable (change (increase/decrease) in y for a unit increase in \[x_i)\]
\[\epsilon\] : error term (variability in y that cannot be explained by the linear relationship with x)
Ordinary Least Squares (OLS) minimizes the sum of squared residuals (errors). The sum of squared differences between observed values and predicted values
\( \text{Minimize } \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \)
\[y_i\] : actual value
\[\hat{y}_i\] : predicted value
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
df = pd.DataFrame(data)
# Define predictor and response variable
X = df[['Advertising']]
y = df['Sales']
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and fit the model
model = LinearRegression()
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R-squared:", r2)
linear_model.LinearRegression
- ordinary least squares
Linear Regression
coef_
- estimated coefficients for the linear regression problemintercept_
- independent term in the linear modelfit(X, y)
- fit linear model, X: training data; y: target valuespredict(X)
- predict using the linear modelscore(X, y)
- return the coefficient of determination R2 of the predictionType of regression analysis in which the relationship between the independent variable (predictor) and the dependent variable (response) is modeled as an n-th degree polynomial
Polynomial regression is capable of capturing non-linear relationships by fitting a curved line to the data
As the degree n increases, the curve becomes more flexible, allowing the model to capture more complex patterns in the data
Used when relationship between variables is non-linear, but still smooth (can be captured by a polynomial curve)
\( y = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \cdots + \beta_n x^n + \epsilon \)
\[y\] : dependent (response/target) variable
\[x\] : independent (predictor/feature) variable
\[\beta_i\] : coefficients of the polynomial
\[n\] : degree of the polynomial (quadratic for n=2, cubic for n=3)
\[\epsilon\] : error term
Ordinary Least Squares (OLS) minimizes the sum of squared residuals (errors). The sum of squared differences between observed values and predicted values
\( \text{Minimize } \sum_{i=1}^{n} \left( y_i - \left( \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \cdots + \beta_n x_i^n \right) \right)^2 \)
\[y_i\] : actual value
\[\hat{y}_i\] : predicted value
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
# Sample data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(-1, 1)
y = np.array([1, 4, 9, 16, 25, 36, 49, 64, 81])
# Transform the data to include polynomial terms (e.g., x, x^2)
poly = PolynomialFeatures(degree=2) # Change degree as needed
x_poly = poly.fit_transform(x)
# Fit the polynomial regression model
model = LinearRegression()
model.fit(x_poly, y)
# Predict values
y_pred = model.predict(x_poly)
# Evaluation
mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)
# Plotting the results
plt.scatter(x, y, color='blue', label='Actual data')
plt.plot(x, y_pred, color='red', label='Polynomial regression fit')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()
preprocessing.PolynomialFeatures
- generate polynomial and
interaction features
degree
- int specifies the maximal degree of the polynomial features, or tuple
for (min_degree, max_degree)fit_transform
- fit to data, then transform it (X: input samples; y: target
values)Linear regression technique that includes a regularization term to mitigate overfitting, especially when working with multicollinear data
Penalty is added to the linear regression objective function based on the squared magnitude of the coefficients, effectively “shrinking” coefficients towards zero but never allowing them to be exactly zero
Used for scenarios with multicollinearity among features, high-dimensional data where overfitting is a concern, situations where interpretability is less critical than predictive performance
\[ \text{Cost} = \text{MSE} + \lambda \sum_{j=1}^{p} w_j^2 \]
\[ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\] : the Mean Squared Error
\[\lambda\] : regularization parameter that controls the strength of the penalty
\[w_j\] : coefficients (weights) of the model
As 𝜆 increases, the influence of the regularization term also increases, forcing the values of the coefficients 𝑤𝑗 to shrink, but not to exactly zero
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate or load data
X = np.random.rand(100, 3) # example features
y = X @ np.array([3, 5, 2]) + np.random.normal(0, 1, 100) # example target with added noise
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Scale the features (important for regularized models)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Fit the Ridge Regression model
ridge = Ridge(alpha=1.0) # alpha is the regularization parameter (lambda)
ridge.fit(X_train_scaled, y_train)
# Predict and evaluate the model
y_pred = ridge.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
print("Model Coefficients:", ridge.coef_)
linear_model.Ridge
- linear least squares with l2
regularization
alpha
- constant that multiplies the L2 term, controlling regularization
strengthcoef_
- weight vector(s)intercept_
- independent term in decision functionfit(X, y)
- fit Ridge regression model, X: training data; y: target valuespredict(X)
- predict using the linear modelscore(X, y)
- return the coefficient of determination R2 of the predictionRegression technique that introduces a penalty equal to the absolute value of the magnitude of coefficients
Uses an L1 penalty (the sum of absolute values of coefficients) that results in feature selection as it drives some coefficients to zero, effectively removing less important features from the model
Used when feature selection is required (high-dimensional datasets with many irrelevant features), sparse solutions are desirable (only a few significant features should have non-zero coefficients), data overfitting is a concern
\[ \text{Cost} = \text{MSE} + \lambda \sum_{j=1}^{p} |w_j| \]
\[ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\] : the Mean Squared Error
\[\lambda\] : regularization parameter that controls the strength of the penalty
\[wj\] : coefficients (weights) of the model
As 𝜆 increases the penalty term grows, shrinking coefficients toward zero, when is sufficiently strong some coefficients may be forced to zero, resulting in automatic feature selection
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate or load sample data
X = np.random.rand(100, 3) # example features
y = X @ np.array([3, 5, 0]) + np.random.normal(0, 1, 100) # target with sparse signal
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Scale the features (important for regularized models)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Fit the Lasso Regression model
lasso = Lasso(alpha=1.0) # alpha is the regularization parameter (lambda)
lasso.fit(X_train_scaled, y_train)
# Predict and evaluate the model
y_pred = lasso.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
print("Model Coefficients:", lasso.coef_)
linear_model.Lasso
- linear model trained with L1 prior as
regularizer
alpha
- constant that multiplies the L1 term, controlling regularization
strengthcoef_
- weight vector(s)intercept_
- independent term in decision functionfit(X, y)
- fit Lasso regression model, X: training data; y: target valuespredict(X)
- predict using the linear modelscore(X, y)
- return the coefficient of determination R2 of the predictionNon-parametric, instance-based algorithm that predicts the value of a target variable by averaging the values of the 𝑘-closest data points (neighbors) to a given input point
Doesn’t assume any underlying distribution of the data or a linear relationship between variables (flexible choice, especially for non-linear datasets)
Used when the data has local patterns and you need a non-parametric model, works well with small datasets but struggles with large datasets
\[ d(x, x') = \sqrt{\sum_{i=1}^{n} (x_i - x_i')^2} \] : Euclidean Distance between two points
\[ d(x, x') = \sum_{i=1}^{n} |x_i - x_i'| \] : Manhattan Distance
\[ \hat{y} = \frac{1}{k} \sum_{i=1}^{k} y_i \] : Prediction for KNN Regression as the average of the target values of the \( k \) nearest neighbors
\[ \hat{y} = \frac{\sum_{i=1}^{k} \frac{y_i}{d(x, x_i)}}{\sum_{i=1}^{k} \frac{1}{d(x, x_i)}} \] : Weighted KNN prediction
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate or load sample data
X = np.arange(1, 21).reshape(-1, 1) # example feature (e.g., month)
y = np.random.normal(50, 10, 20) # example target (e.g., sales)
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Create and fit the KNN Regressor model
knn = KNeighborsRegressor(n_neighbors=3, weights='distance') # using weighted KNN
knn.fit(X_train_scaled, y_train)
# Predict and evaluate the model
y_pred = knn.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
neighbors.KNeighborsRegressor
- regression based on
k-nearest neighbors
n_neighbors
- number of neighbors to use by default for kneighbors queriesweights
- weight function used in prediction (uniform, distance or callable)
coef_
- weight vector(s)intercept_
- independent term in decision functionfit(X, y)
- fit k-nearest neighbors regression model, X: training data; y:
target valuespredict(X)
- predict the target for the provided datascore(X, y)
- return the coefficient of determination R2 of the predictionType of Support Vector Machine (SVM) adapted for regression tasks, where the goal is to predict continuous values rather than classify data points
Attempts to find a function that fits the data within a specified margin of error (epsilon-insensitive margin), where the model disregards errors that fall within a distance 𝜖 of the true values
\[ L(y, f(x)) = \begin{cases} 0 & \text{if } |y - f(x)| \leq \epsilon \\ |y - f(x)| - \epsilon & \text{otherwise} \end{cases} \] : Epsilon-Insensitive Loss Function
\[ \min \frac{1}{2} ||w||^2 + C \sum_{i=1}^{n} \max(0, |y_i - f(x_i)| - \epsilon) \] : Objective function for SVR, balancing flatness and error minimization
\[ ||w||^2 \] : controls the flatness of the regression line
\[ C \] : regularization parameter that balances margin maximization and error minimization
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate or load data
# X = features, y = target
X = np.arange(1, 21).reshape(-1, 1) # example feature (e.g., months)
y = np.random.normal(50, 10, 20) # example target (e.g., sales)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Standardize data (SVR often performs better with scaled data)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Fit SVR model with RBF kernel
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train_scaled, y_train)
# Predict and evaluate
y_pred = svr.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)
sklearn.svm.SVR
- epsilon-support vector regression
kernel
- specifies the kernel type to be used in the algorithm (linear, poly,
rbf, sigmoid, precomputed)C
- regularization parameter
epsilon
- specifies the epsilon-tube within which no penalty is associated in
the training loss function with points predicted within a distance epsilon from the actual
valuefit(X, y)
- fit the SVM model according to the given training data, X: training
vectors; y:
target valuespredict(X)
- perform regression on samples in Xscore(X, y)
- return the coefficient of determination R2 of the predictionstatistical and machine learning method commonly used for binary classification problems, where the goal is to predict the probability of a binary outcome (e.g., yes/no, true/false, 0/1) based on one or more predictor variables
logistic regression is fundamentally a classification algorithm that applies a logistic (or sigmoid) function to estimate probabilities
\[ \sigma(z) = \frac{1}{1 + e^{-z}} \] : logistic or sigmoid function is used to map any real-valued number to a probability between 0 and 1
\[ z = w_0 + w_1 x_1 + w_2 x_2 + \dots + w_p x_p \] : linear combination for z
\[ P(y = 1 | X) = \sigma(z) = \frac{1}{1 + e^{-(w_0 + w_1 x_1 + w_2 x_2 + \dots + w_p x_p)}} \] : The probability of the positive class (class 1)
\[ P(y = 0 | X) = 1 - \sigma(z) = \frac{e^{-(w_0 + w_1 x_1 + w_2 x_2 + \dots + w_p x_p)}}{1 + e^{-(w_0 + w_1 x_1 + w_2 x_2 + \dots + w_p x_p)}} \] : The probability of the negative class (class 0)
\[ y = \begin{cases} 1 & \text{if } P(y=1 | X) \geq 0.5 \\ 0 & \text{if } P(y=1 | X) < 0.5 \end{cases} \] : decision threshold (commonly 0.5) to classify the output
\[ \text{Log Loss} = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \cdot \log(\hat{y}_i) + (1 - y_i) \cdot \log(1 - \hat{y}_i) \right] \] : Log Loss (Binary Cross-Entropy Loss) cost function
\[ \hat{w} = \arg \min_{w} -\sum_{i=1}^{n} \left[ y_i \cdot \log(\sigma(z_i)) + (1 - y_i) \cdot \log(1 - \sigma(z_i)) \right] \] : Maximum Likelihood Estimation - the objective of logistic regression is to maximize the likelihood of the observed data, often by minimizing the negative log likelihood
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
# Generate or load example data
X = [[1.1], [2.5], [3.3], [4.5], [5.1], [6.2], [7.4]] # example features
y = [0, 0, 1, 1, 0, 1, 1] # binary target
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize and train the model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
A supervised learning approach that determines the class label for an unlabeled test case.
Categorizing some unknown items into a discrete set of categories or "classes"
The target attribute is a categorical or discrete variable