Join my course at Udemy (Python Programming Bible-From beginner to advanced )

Blogger templates

Friday, 7 October 2022

8. Data Analysis - Visualization with Matplotlib

 Basic line plot

%matplotlib notebook -- to create visualization in jupyter notebook
import matplotlib.pyplot as plt
import numpy as np

X = np.arange(10)
plt.plot(X)

Create figure and Plot in two lines

1. Create Figure

fig = plt.figure()

2. Create Plot and add plot to figure

ax1 = fig.add_subplot(2,2,1)  --- 2 rows, 2 column and we are selecting 1st plot.

Without creating subplot ( plt.plot)

fig = plt.figure()
plt.plot(np.random.rand(50), 'k-') -- It will go to rightmost bottom.

Creating figure and axis in the same line.

fig, axes = plt.subplots(2,3)
axes[0,1].hist()

Creating same X and Y axis for all subplots.

fig, axes = plt.subplots(2,3, sharex=True, sharey = True)

Remove space between subplots.

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0)

Adding color and linestyle.

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, linestyle ="--" , color ='r')
fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, "r--") -- short form.

Add Marker

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, linestyle ="--" , color ='r', marker = "o")

Connecting 2 dots

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, linestyle ="--" , color ='r', marker = "o", drawstyle="steps-post") -- steps-pre/steps-mid/steps

Add label

fig, axes = plt.subplots(2,3, sharex=True, sharey = True, wspace =0, hspace=0, linestyle ="--" , color ='r', marker = "o", drawstyle="steps-post", label="line")


X and Y axis label

fig = plt.figure()
ax1= fig.add_subplot(1,1,1)
ticks = ax1.set_xticks([0,10,20,30])
labels = ax1.set_xticklabels(["zero,", "ten","twnety","thirty"])
ax1.plot()

Change orientation and size of the X labels

fig = plt.figure()
ax1= fig.add_subplot(1,1,1)
ticks = ax1.set_xticks([0,10,20,30])
labels = ax1.set_xticklabels(["zero,", "ten","twnety","thirty"], rotation = 90, fontsize="large")
ax1.plot()

X-axis label, Title

ax1.set_xlabel("Xlabels")
ax1.set_title("Title)

Plots

veritical barplot

ax1.bar(["Car", "Truck", "Bus", "Auto"], [10,20,30,40]) -- Categorical + numeric data

horizontal bar plot

ax1.barh(["Car", "Truck", "Bus", "Auto"], [10,20,30,40]) -- Categorical + numeric data

histogram

ax1.hist(X, bin = 50)

pie chart

ax1.pie([10,20,30], labels=["car", "bus", "truck"])

scatter plot

ax1.scatter(x,y, marker="^", color="g")

Box/Violon plot

ax1.boxplot(X)
ax1.violinplot(X)


Share:

Thursday, 6 October 2022

1. Python Machine learning - Regularized Linear Model

Linear Regression

from sklearn.linear_model import LinearRegression     
lin_reg = LinearRegression()
lin_reg.fit(X, y)
print(lin_reg.intercept_, lin_reg.coef_)
lin_reg.predict(X_new)

Polynomial Regression

Transform to polynomial feature

from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree =2, include_bias=False)
X_poly = poly_features.fit_transform(X) --- X_poly will have two terms - degree =1 and degree 2

Now use Linear Regression

lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
print(lin_reg.intercept_, lin_reg.coef_)

Training and test error

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

def plot_learning_curves(model, x, y):
      X_train, X_val, Y_train, y_val = train_test_split(x, y, test_size=0.2)
      train_error, val_errors = [], []
      for m in range(1, len(X_train)):
             model.fit(X_train[:m], y_train[:m])
             y_train_predict = model.predict(X_train[:m])  -- no of sampling of training is changing from 1 to len(X_train)
             y_val_predict = model.predict(X_val) -- always taken for all validation samples.
              train_errors.append(mean_squared_error(y_train[:m], y_val_predict[:m]))
              val_errors.append(mean_squared_error(y_val, y_val_predict))

plt.plot(np.sqrt(train_errors, "r-+", linewidth=2, label="train"))
plt.plot(np.sqrt(val_errors, "b-+", linewidth=2, label="validation"))           

Using Pipeline

from   sklearn.pipeline import Pipeline

polynomial_regression = Pipeline ([
                                           ("poly features", PolynomialFeatures(degree=10, include_bias=False)),
                                           ("lin reg", LinearRegression())
                                            ])
plot_learning_curve(polynomial_regression, X, y)


Gradient Regression

Batch Gradient Descent

Stochastic Gradient Descent

Mini Batch Gradient Descent

Ridge Regression

from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=1, solver="cholesky")
ridge_reg.fit(X,y)
ridge_reg.predict([1.5])

Lasso Regression ( Lest Absolute Shrinkage and Selection Operator Regression )

from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha = 0.1)
lasso_reg.fit(X, y)
lasso_reg.predict([1.5])

Elastic net

from sklearn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net.fit(X,y)
elastic_net.predict([1.5])






Share:

Tuesday, 4 October 2022

7. Data Analysis - Data Aggregation & Grouping

 Split, Apply, Combine

Mean

  • X["Values1"].groupby([X["Keys1"]]).mean() -- Mean of "Values1" based on Keys = "Key1"
  • X["Values1"].groupby([X["Keys1"],X["Keys2"]]).mean()
  • X.groupBy["Keys1"].mean()
  • X.groupBy(X["Keys1"], X["Keys2"]).mean()

Count

  • X.groupBy(X["Keys1"], X["Keys2"]).size()

GroupBy Clause with FOR Loop

  • for name, group in X.groupby([["Key1"]])
    • print (name)
    • print(group)

Accessing each group in dictionary key

  • dict(list(X.groupby([["Key1"]])))

Extracting based on column

  • X.groupby(X.dtypes, axis =1) -- column will be seperated based on data types.

Column Grouping

  • Create a mapping from column to grouping
    • mapColumns={'a':'Group1', 'b':'Group1', 'c':'Group2', 'd':'Group2', 'e':'Group3', 'f':'Group3'}
  • X.groupby(mapColumns, axis=1).mean()

Passing Lambda function in groupby

  •  X.groupby(lambda x:x<'c', axis =1).mean() -- Two group will be created with TRUE/FALSE column.

Aggregate function

  • def max_min(x)
    • return x.max() - x.min()
  • X.agg([max, min, max_min]) -- For each column min, max, max_min function will be called.

Aggregate function with groupby

  • X1 = X.groupby(["key1"]) -- Will create dataframe for each groups in "key1"
  • X1.agg(min, max, max_min)) -- Will apply aggregate function for each of that column.

Aggregate function with custom defined column name

  • X1.agg(("Maximum", max),("Minimum", min),("Maximum_Minimum",max_min))

GroupBy with Apply

  • X.groupby().apply(fName)
  • X.groupby(["key1"]).apply(lambda x: x.min())  -- Passing lambda function, User defined function can also be applied.
GroupBy with Apply with passing arguments to function
  • def minimumFb(X, x):
    • return(X.max()-X.min() > x)
  • X.groupby(["key1"]).apply(minimumFb(10))

Apply function with bucket analysis

  • quartiles = X.cut(X.Values1, 2)
  • X.groupby(quartiles).apply(lambda x:x.min())

Pivot Table

  • X.pivot_table(values="AvgSizeOfTrip" , index = "Gender" , columns="Group" )

Pivot table with aggregation

  • X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc=np.sum)
  • X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc={"Col1":np.sum, 'Col2':np.mean}) - different column with different operation.

Pivot table with count

  • pd.crosstab(X.gender, X.group)  -- This just counts the frequency.




Share:

6. Data Analysis - Data Wrangling

 Hierarchical Indexing in Pandas Series

  • Create multi level indexing in pandas series
    • pd.Series(np.random.rand(9), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],)
  • Access the values
    • X['a']  -- using exact index
    • X[:, 1]  -- using slice
    • X['a':'b', 1] -- using slice
  • Access Index
    • X.index  -- return tuple
  • Changing multi level indexing to row/col 
    • X.unstack()   -- to row/col representation.
    • X.unstack(level = -1) -- first level index will be ROW and second level index will be column
    • X.unstack(level=0) first level index will be Column and second level index will be row
  • Changing dataframe to single pandas series
    • X.stack() -- to multi level index, row will become 1st level index and column will become second level index.
Hierarchical indexing in Pandas Dataframe
  • Create multi level indexing for both row and columns
    • X=pd.Series(np.random.rand(9, 6), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],
    •                                                 columns=[["col1","col1","col2","col2","col3","col3"],['c1','c2', 'c3', 'c4', 'c5', 'c6']])
  • Access name of rows and columns
    • X.index.names
    • X.columns.name
  • Access row/column values
    • X.loc[["row1"],[["col1", "col2"]]]

Swap multilevel index

  • X1.swap_level(['Row", "Row2"])  -- Row1 and Row2 will interchange.

Sort index

  • X1.sort_index(level=0)
  • X1.sort_index(level=1)

Aggregation of data

  • X1.sum(level = 'Row1')  -- Summation for Row1
  • X1.sum(level - 'Row2')
  • X1.sum(level = "ColName1", axis =1) -- Summation for Colname1
  • X1.sum(level = "ColName2", axis =1)

Changing column to index

  • X.set_index(['b', 'a']) 
  • X.set_index(['b', 'a'], drop=False)

Change index to column

  • X.reset_index()

Merging data sources

If have common column - INNER JOIN

  • pd.merge(df1, df2 ) -- should have common column and other column are arranges as seperate column. (OR)
  • pd.merge(df1, df2, how="inner")
If does not have common column
  • pd.merge(df1, df2 , left_on="lkey") 
  • pd.merge(df1, df2 ,  right_on="lkey1") 

OUTER JOIN

  • pd.merge(df1, df2, how="outer")  -- INNER JOIN + uncomon element.

LEFT/RIGHT Join

  • pd.merge(df1, df2, how="left")  -- INNER JOIN + uncommon element of left dataframe
  • pd.merge(df1, df2, how="right")  -- INNER JOIN + uncommon element of right dataframe.

It is possible to have more than one column as key.

  • pd.merge(df1, df2 ,  right_on=["lkey1", "lkey2"]) -- 

Concatenate

  • pd.concatenate([X, Y], axis=1) -- merged side by side
  • pd.concatenate([X,Y], axis =0) -- stacked top to bottom.
  • pd.concatenate([X,Y], ignore_index=True) - is stacked top to bottom ignoring matching of row index and pandas creates own index like 0,1,2 etc
  • pd.concatenate([X,Y], ignore_index=False) - existing index names of original dataframe is retained.

Combining two dataframe to fill missing values

  • X1.combine_first(X2) -- Value of X2 will be used to fill the NAN value in dataframe = X1.

Pivot and Melt function

  • X.pivot(index="c1", columns="c2", value = "col3") -- 'c1' becomes ROW, 'c2' becomes column and 'col3' becomes value across 'c1' and 'c2'
  • X.pivot(index="c1", columns="c2", value = ["col3","col4"])  -- 'col3' and 'col4' values are stacked side by side in the o/p

Melt function

  • pd.melt(X, ["col1"]) -- Three column comes -- "col1", variable(name of all other columns.), values (value of all other columns.)
Share:

Feature Top (Full Width)

Pageviews

Search This Blog