Basic line plot%matplotlib notebook -- to create visualization in jupyter notebookimport matplotlib.pyplot as pltimport numpy as npX = np.arange(10)plt.plot(X)Create figure and Plot in two lines1. Create Figurefig = plt.figure()2. Create Plot and add plot to figureax1 = fig.add_subplot(2,2,1) --- 2 rows, 2 column and we are selecting...
Friday, 7 October 2022
Thursday, 6 October 2022
1. Python Machine learning - Regularized Linear Model
Linear Regressionfrom sklearn.linear_model import LinearRegression lin_reg = LinearRegression()lin_reg.fit(X, y)print(lin_reg.intercept_, lin_reg.coef_)lin_reg.predict(X_new)Polynomial RegressionTransform to polynomial featurefrom sklearn.preprocessing import PolynomialFeaturespoly_features = PolynomialFeatures(degree =2, include_bias=False)X_poly...
Tuesday, 4 October 2022
7. Data Analysis - Data Aggregation & Grouping
Split, Apply, CombineMeanX["Values1"].groupby([X["Keys1"]]).mean() -- Mean of "Values1" based on Keys = "Key1"X["Values1"].groupby([X["Keys1"],X["Keys2"]]).mean()X.groupBy["Keys1"].mean()X.groupBy(X["Keys1"], X["Keys2"]).mean()CountX.groupBy(X["Keys1"], X["Keys2"]).size()GroupBy Clause with FOR Loopfor name, group in X.groupby([["Key1"]])print...
6. Data Analysis - Data Wrangling
Hierarchical Indexing in Pandas SeriesCreate multi level indexing in pandas seriespd.Series(np.random.rand(9), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],)Access the valuesX['a'] -- using exact indexX[:, 1] -- using sliceX['a':'b', 1] -- using sliceAccess IndexX.index -- return tupleChanging multi level...
Thursday, 29 September 2022
5. Data Analysis - Data Preparation
Check Missing dataX.isnull()X.notnull()Delete Missing DataX.drop_na() -- delete any row that has missing dataX.drop_na(how = all) - if all column in a row has missing data.X.drop_na(axis=1) - will delete a column that has missing dataX.drop_na(axis=1, how=all) - will delete a column if all value in column has missing data.Impute missing valueX.fillna(99)...
4. Data Analysis - Data Ingestion
Data IngestionData Ingestionpd.read_csv("XYZ.csv", )pd.read_table("XYZ.csv", sep=",")pd.read_table("XYZ.csv", sep=",", header=None) -- pandas will provide header column.pd.read_table("XYZ.csv", sep=",", names=['a', 'b', 'c', 'd', 'e']) --- provide column names.pd.read_table("XYZ.csv", sep=",", names=['a', 'b', 'c', 'd', 'e'], index_col="Names") ---...
Wednesday, 28 September 2022
3. Data Analysis - Pandas Dataframe
Pandas Dataframe creationDataframe creation using dictionary ( with only column values)data1 = {State:["Karnataka", "Jharkhand"], Year:["2021", "2022"], Name:['ABC', 'DEF]}X= pd.DataFrame(data1) ------- dataframe creation with all featuresX= pd.DataFrame(data1, columns=["State", "Year"]) ---- dataframe creation with 2 featuresX= pd.DataFrame(data1,...
Tuesday, 27 September 2022
2. Data Analysis - Pandas Series
Series CreationSeries creation with default indexX= pandas.series([10,20,30,40]) -- by passing a listX.index(), X.values()print(X[0], X[[0,2,3]] , X[1:3] --- Access the series valueSeries creation with labeled indexX = pd.series([10,20,30,40], Index = ['l1', 'l2', 'l3', 'l4'])X['l1'], X[['l1', 'l2']], X['l2':'l4']Series creation using dictionarypd.Series(dict1)...
Monday, 26 September 2022
1. Data Analysis - NumPy Operations
Numpy Operations Numpy Array Creation - X = np.array[[10,20,30,40]]X = np.zeros(10)X= np.ones(10)X = np.empty (10)X=np.arange([1,11]) ---- will create 10 element array starting from element=1 to 10Array Creation using data typeX = np.array([10,20,30,40], dtype = np.floar64)Changing data typeX = X.astype(np.int32)Arithmetic...