October 2022 ~ Python and Machine Learning Blog

Friday, 7 October 2022

8. Data Analysis - Visualization with Matplotlib

October 07, 2022 No comments

Basic line plot

%matplotlib notebook -- to create visualization in jupyter notebook

import matplotlib.pyplot as plt

import numpy as np

X = np.arange(10)

plt.plot(X)

Create figure and Plot in two lines

1. Create Figure

fig = plt.figure()

2. Create Plot and add plot to figure

ax1 = fig.add_subplot(2,2,1) --- 2 rows, 2 column and we are selecting 1st plot.

Without creating subplot ( plt.plot)

fig = plt.figure()

plt.plot(np.random.rand(50), 'k-') -- It will go to rightmost bottom.

Creating figure and axis in the same line.

fig, axes = plt.subplots(2,3)

axes[0,1].hist()

Creating same X and Y axis for all subplots.

fig, axes = plt.subplots(2,3, sharex=True, sharey = True)

Remove space between subplots.

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0)

Adding color and linestyle.

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, linestyle ="--" , color ='r')

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, "r--") -- short form.

Add Marker

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, linestyle ="--" , color ='r', marker = "o")

Connecting 2 dots

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, linestyle ="--" , color ='r', marker = "o", drawstyle="steps-post") -- steps-pre/steps-mid/steps

Add label

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, linestyle ="--" , color ='r', marker = "o", drawstyle="steps-post", label="line")

X and Y axis label

fig = plt.figure()

ax1= fig.add_subplot(1,1,1)

ticks = ax1.set_xticks([0,10,20,30])

labels = ax1.set_xticklabels(["zero,", "ten","twnety","thirty"])

ax1.plot()

Change orientation and size of the X labels

fig = plt.figure()

ax1= fig.add_subplot(1,1,1)

ticks = ax1.set_xticks([0,10,20,30])

labels = ax1.set_xticklabels(["zero,", "ten","twnety","thirty"], rotation = 90, fontsize="large")

ax1.plot()

X-axis label, Title

ax1.set_xlabel("Xlabels")

ax1.set_title("Title)

Plots

veritical barplot

ax1.bar(["Car", "Truck", "Bus", "Auto"], [10,20,30,40]) -- Categorical + numeric data

horizontal bar plot

ax1.barh(["Car", "Truck", "Bus", "Auto"], [10,20,30,40]) -- Categorical + numeric data

histogram

ax1.hist(X, bin = 50)

pie chart

ax1.pie([10,20,30], labels=["car", "bus", "truck"])

scatter plot

ax1.scatter(x,y, marker="^", color="g")

Box/Violon plot

ax1.boxplot(X)

ax1.violinplot(X)

1. Python Machine learning - Regularized Linear Model

October 06, 2022 No comments

Linear Regression

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(X, y)

print(lin_reg.intercept_, lin_reg.coef_)

lin_reg.predict(X_new)

Polynomial Regression

Transform to polynomial feature

from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree =2, include_bias=False)

X_poly = poly_features.fit_transform(X) --- X_poly will have two terms - degree =1 and degree 2

Now use Linear Regression

lin_reg = LinearRegression()

lin_reg.fit(X_poly, y)

print(lin_reg.intercept_, lin_reg.coef_)

Training and test error

from sklearn.metrics import mean_squared_error

from sklearn.model_selection import train_test_split

def plot_learning_curves(model, x, y):

X_train, X_val, Y_train, y_val = train_test_split(x, y, test_size=0.2)

train_error, val_errors = [], []

for m in range(1, len(X_train)):

model.fit(X_train[:m], y_train[:m])

y_train_predict = model.predict(X_train[:m]) -- no of sampling of training is changing from 1 to len(X_train)

y_val_predict = model.predict(X_val) -- always taken for all validation samples.

train_errors.append(mean_squared_error(y_train[:m], y_val_predict[:m]))

val_errors.append(mean_squared_error(y_val, y_val_predict))

plt.plot(np.sqrt(train_errors, "r-+", linewidth=2, label="train"))

plt.plot(np.sqrt(val_errors, "b-+", linewidth=2, label="validation"))

Using Pipeline

from sklearn.pipeline import Pipeline

polynomial_regression = Pipeline ([

("poly features", PolynomialFeatures(degree=10, include_bias=False)),

("lin reg", LinearRegression())

])

plot_learning_curve(polynomial_regression, X, y)

Gradient Regression

Batch Gradient Descent

Stochastic Gradient Descent

Mini Batch Gradient Descent

Ridge Regression

from sklearn.linear_model import Ridge

ridge_reg = Ridge(alpha=1, solver="cholesky")

ridge_reg.fit(X,y)

ridge_reg.predict([1.5])

Lasso Regression ( Lest Absolute Shrinkage and Selection Operator Regression )

from sklearn.linear_model import Lasso

lasso_reg = Lasso(alpha = 0.1)

lasso_reg.fit(X, y)

lasso_reg.predict([1.5])

Elastic net

from sklearn.linear_model import ElasticNet

elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)

elastic_net.fit(X,y)

elastic_net.predict([1.5])

7. Data Analysis - Data Aggregation & Grouping

October 04, 2022 No comments

Split, Apply, Combine

Mean

X["Values1"].groupby([X["Keys1"]]).mean() -- Mean of "Values1" based on Keys = "Key1"
X["Values1"].groupby([X["Keys1"],X["Keys2"]]).mean()
X.groupBy["Keys1"].mean()
X.groupBy(X["Keys1"], X["Keys2"]).mean()

Count

X.groupBy(X["Keys1"], X["Keys2"]).size()

GroupBy Clause with FOR Loop

for name, group in X.groupby([["Key1"]])

print (name)
print(group)

Accessing each group in dictionary key

dict(list(X.groupby([["Key1"]])))

Extracting based on column

X.groupby(X.dtypes, axis =1) -- column will be seperated based on data types.

Column Grouping

Create a mapping from column to grouping

mapColumns={'a':'Group1', 'b':'Group1', 'c':'Group2', 'd':'Group2', 'e':'Group3', 'f':'Group3'}

X.groupby(mapColumns, axis=1).mean()

Passing Lambda function in groupby

X.groupby(lambda x:x<'c', axis =1).mean() -- Two group will be created with TRUE/FALSE column.

Aggregate function

def max_min(x)

return x.max() - x.min()

X.agg([max, min, max_min]) -- For each column min, max, max_min function will be called.

Aggregate function with groupby

X1 = X.groupby(["key1"]) -- Will create dataframe for each groups in "key1"
X1.agg(min, max, max_min)) -- Will apply aggregate function for each of that column.

Aggregate function with custom defined column name

X1.agg(("Maximum", max),("Minimum", min),("Maximum_Minimum",max_min))

GroupBy with Apply

X.groupby().apply(fName)
X.groupby(["key1"]).apply(lambda x: x.min()) -- Passing lambda function, User defined function can also be applied.

GroupBy with Apply with passing arguments to function

def minimumFb(X, x):

return(X.max()-X.min() > x)

X.groupby(["key1"]).apply(minimumFb(10))

Apply function with bucket analysis

quartiles = X.cut(X.Values1, 2)
X.groupby(quartiles).apply(lambda x:x.min())

Pivot Table

X.pivot_table(values="AvgSizeOfTrip" , index = "Gender" , columns="Group" )

Pivot table with aggregation

X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc=np.sum)
X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc={"Col1":np.sum, 'Col2':np.mean}) - different column with different operation.

Pivot table with count

pd.crosstab(X.gender, X.group) -- This just counts the frequency.

6. Data Analysis - Data Wrangling

October 04, 2022 No comments

Hierarchical Indexing in Pandas Series

Create multi level indexing in pandas series

pd.Series(np.random.rand(9), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],)

Access the values

X['a'] -- using exact index
X[:, 1] -- using slice
X['a':'b', 1] -- using slice

Access Index

X.index -- return tuple

Changing multi level indexing to row/col

X.unstack() -- to row/col representation.
X.unstack(level = -1) -- first level index will be ROW and second level index will be column
X.unstack(level=0) - first level index will be Column and second level index will be row

Changing dataframe to single pandas series

X.stack() -- to multi level index, row will become 1st level index and column will become second level index.

Hierarchical indexing in Pandas Dataframe

Create multi level indexing for both row and columns

X=pd.Series(np.random.rand(9, 6), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],
columns=[["col1","col1","col2","col2","col3","col3"],['c1','c2', 'c3', 'c4', 'c5', 'c6']])

Access name of rows and columns

X.index.names
X.columns.name

Access row/column values

X.loc[["row1"],[["col1", "col2"]]]

Swap multilevel index

X1.swap_level(['Row", "Row2"]) -- Row1 and Row2 will interchange.

Sort index

X1.sort_index(level=0)
X1.sort_index(level=1)

Aggregation of data

X1.sum(level = 'Row1') -- Summation for Row1
X1.sum(level - 'Row2')
X1.sum(level = "ColName1", axis =1) -- Summation for Colname1
X1.sum(level = "ColName2", axis =1)

Changing column to index

X.set_index(['b', 'a'])
X.set_index(['b', 'a'], drop=False)

Change index to column

X.reset_index()

Merging data sources

If have common column - INNER JOIN

pd.merge(df1, df2 ) -- should have common column and other column are arranges as seperate column. (OR)
pd.merge(df1, df2, how="inner")

If does not have common column

pd.merge(df1, df2 , left_on="lkey")
pd.merge(df1, df2 , right_on="lkey1")

OUTER JOIN

pd.merge(df1, df2, how="outer") -- INNER JOIN + uncomon element.

LEFT/RIGHT Join

pd.merge(df1, df2, how="left") -- INNER JOIN + uncommon element of left dataframe
pd.merge(df1, df2, how="right") -- INNER JOIN + uncommon element of right dataframe.

It is possible to have more than one column as key.

pd.merge(df1, df2 , right_on=["lkey1", "lkey2"]) --

Concatenate

pd.concatenate([X, Y], axis=1) -- merged side by side
pd.concatenate([X,Y], axis =0) -- stacked top to bottom.
pd.concatenate([X,Y], ignore_index=True) - is stacked top to bottom ignoring matching of row index and pandas creates own index like 0,1,2 etc
pd.concatenate([X,Y], ignore_index=False) - existing index names of original dataframe is retained.

Combining two dataframe to fill missing values

X1.combine_first(X2) -- Value of X2 will be used to fill the NAN value in dataframe = X1.

Pivot and Melt function

X.pivot(index="c1", columns="c2", value = "col3") -- 'c1' becomes ROW, 'c2' becomes column and 'col3' becomes value across 'c1' and 'c2'
X.pivot(index="c1", columns="c2", value = ["col3","col4"]) -- 'col3' and 'col4' values are stacked side by side in the o/p

Melt function

pd.melt(X, ["col1"]) -- Three column comes -- "col1", variable(name of all other columns.), values (value of all other columns.)

Python and Machine Learning Blog

Blogger templates

Friday, 7 October 2022

8. Data Analysis - Visualization with Matplotlib

Basic line plot

Create figure and Plot in two lines

1. Create Figure

2. Create Plot and add plot to figure

Without creating subplot ( plt.plot)

Creating figure and axis in the same line.

Creating same X and Y axis for all subplots.

Remove space between subplots.

Adding color and linestyle.

Add Marker

Connecting 2 dots

Add label

X and Y axis label

Change orientation and size of the X labels

X-axis label, Title

Plots

veritical barplot

horizontal bar plot

histogram

pie chart

scatter plot

Box/Violon plot

Thursday, 6 October 2022

1. Python Machine learning - Regularized Linear Model

Linear Regression

Polynomial Regression

Transform to polynomial feature

Now use Linear Regression

Training and test error

Using Pipeline

from sklearn.pipeline import Pipeline

Gradient Regression

Stochastic Gradient Descent

Mini Batch Gradient Descent

Ridge Regression

Lasso Regression ( Lest Absolute Shrinkage and Selection Operator Regression )

Elastic net

Tuesday, 4 October 2022

7. Data Analysis - Data Aggregation & Grouping

Split, Apply, Combine

Mean

Count

GroupBy Clause with FOR Loop

Accessing each group in dictionary key

Extracting based on column

Column Grouping

Passing Lambda function in groupby

Aggregate function

Aggregate function with groupby

Aggregate function with custom defined column name

GroupBy with Apply

Apply function with bucket analysis

Pivot Table

Pivot table with aggregation

Pivot table with count

6. Data Analysis - Data Wrangling

Hierarchical Indexing in Pandas Series

Swap multilevel index

Sort index

Aggregation of data

Changing column to index

Change index to column

Merging data sources

If have common column - INNER JOIN

OUTER JOIN

LEFT/RIGHT Join

It is possible to have more than one column as key.

Concatenate

Combining two dataframe to fill missing values

Pivot and Melt function

Melt function

Feature Top (Full Width)

Pageviews

Search This Blog

Blogs

Blog Archive