Python Machine Learning – Polynomial Regression
Polynomial Regression is a form of regression analysis where the relationship between the independent variable and the dependent variable is modeled as an -th degree polynomial. This technique is useful when the data exhibits a non-linear relationship but can be approximated by a polynomial.
1. Polynomial Regression Equation
In polynomial regression, the model fits a polynomial equation of degree to the data:
Where:
- is the dependent variable (target),
- is the independent variable (feature),
- are the coefficients of the polynomial equation.
Polynomial regression is still considered a type of linear regression because the regression function is linear in terms of the unknown coefficients , though the relationship between and is non-linear.
2. When to Use Polynomial Regression
- When the relationship between the independent variable and the dependent variable is non-linear.
- When linear regression models fail to provide an accurate fit to the data due to curvatures or patterns that a straight line cannot capture.
3. Polynomial Regression in Python
Python’s scikit-learn library provides tools to easily implement polynomial regression by transforming the original feature(s) into polynomial features and then fitting a linear regression model to these transformed features.
Step-by-Step Example: Polynomial Regression
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
# Example data: Feature (X) and target (Y)
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(-1, 1)
Y = np.array([1, 4, 9, 16, 25, 36, 49, 64, 81]) # Quadratic relationship (Y = X^2)
# Create a PolynomialFeatures object (degree 2 for quadratic)
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
# Create and train the Linear Regression model
model = LinearRegression()
model.fit(X_poly, Y)
# Predict using the polynomial model
Y_pred = model.predict(X_poly)
# Plot the original data and the polynomial regression curve
plt.scatter(X, Y, color='blue') # Original data points
plt.plot(X, Y_pred, color='red') # Polynomial regression line
plt.title('Polynomial Regression (degree 2)')
plt.xlabel('Feature X')
plt.ylabel('Target Y')
plt.show()
# Print the coefficients
print('Coefficients:', model.coef_)
print('Intercept:', model.intercept_)
Explanation:
- Transforming Features:
PolynomialFeatures(degree=2)is used to transform the original feature into polynomial features (including ). - Fitting the Model: A linear regression model is fitted to the transformed features.
- Plotting: The red line shows the polynomial regression curve that fits the data.
In this example, since , a quadratic polynomial is able to capture the relationship perfectly.
4. Polynomial Regression with Higher Degrees
You can use polynomial regression of higher degrees (e.g., cubic, quartic) if the relationship between the variables is more complex. However, beware of overfitting—using a polynomial degree that is too high can result in the model capturing noise in the data rather than the underlying pattern.
Example: Polynomial Regression with Degree 4
# Example data: Feature (X) and target (Y)
X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(-1, 1)
Y = np.array([1, 4, 9, 20, 35, 48, 65, 80, 95]) # Slightly more complex relationship
# Create a PolynomialFeatures object (degree 4)
poly = PolynomialFeatures(degree=4)
X_poly = poly.fit_transform(X)
# Create and train the Linear Regression model
model = LinearRegression()
model.fit(X_poly, Y)
# Predict using the polynomial model
Y_pred = model.predict(X_poly)
# Plot the original data and the polynomial regression curve
plt.scatter(X, Y, color='blue') # Original data points
plt.plot(X, Y_pred, color='red') # Polynomial regression line
plt.title('Polynomial Regression (degree 4)')
plt.xlabel('Feature X')
plt.ylabel('Target Y')
plt.show()
In this example:
- The degree-4 polynomial captures more of the complexity in the data, resulting in a better fit than a linear or quadratic model.
5. Evaluating Polynomial Regression
As with linear regression, you can evaluate the performance of polynomial regression models using metrics such as:
- Mean Squared Error (MSE): Measures the average squared difference between the actual and predicted values.
- R-squared: Represents the proportion of variance explained by the model.
Example: Evaluating the Model
from sklearn.metrics import mean_squared_error, r2_score
# Calculate mean squared error
mse = mean_squared_error(Y, Y_pred)
# Calculate R-squared
r2 = r2_score(Y, Y_pred)
print('Mean Squared Error (MSE):', mse)
print('R-squared:', r2)
6. Overfitting and Underfitting in Polynomial Regression
- Underfitting occurs when the model is too simple to capture the underlying trend in the data. For example, using a linear regression model when the true relationship is quadratic.
- Overfitting occurs when the model is too complex and fits the noise in the training data rather than the underlying trend. This can happen when using a polynomial of too high a degree.
You can balance this trade-off by tuning the degree of the polynomial and using techniques like cross-validation to ensure the model generalizes well to new data.
7. Polynomial Regression for Multiple Features
Polynomial regression can also be applied to datasets with multiple features. In such cases, the interaction between features is also modeled.
Example: Polynomial Regression with Multiple Features
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
# Example data with two features and a quadratic relationship
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
Y = np.array([3, 7, 13, 21, 31]) # Quadratic relationship in both features
# Create PolynomialFeatures object (degree 2)
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
# Create and train the Linear Regression model
model = LinearRegression()
model.fit(X_poly, Y)
# Make predictions
Y_pred = model.predict(X_poly)
# Print the coefficients and intercept
print('Coefficients:', model.coef_)
print('Intercept:', model.intercept_)
In this case, PolynomialFeatures(degree=2) not only includes the square of each feature but also interaction terms between features (e.g., ).
8. Applications of Polynomial Regression
Polynomial regression is widely used in fields such as:
- Physics: Modeling non-linear relationships between physical quantities.
- Economics: Fitting curves to data involving supply-demand or pricing models.
- Finance: Modeling complex relationships between financial indicators.
- Biology: Fitting growth curves or other biological phenomena.
Conclusion
Polynomial regression is a powerful extension of linear regression that models non-linear relationships by transforming the original features into polynomial features. While it can provide better fits for non-linear data, it’s important to avoid overfitting by selecting the appropriate degree of the polynomial. Python’s scikit-learn library makes it easy to implement polynomial regression, and with proper evaluation, this technique can significantly enhance your machine learning models.