6. Data Analysis - Data Wrangling ~ Python and Machine Learning Blog

Tuesday, 4 October 2022

6. Data Analysis - Data Wrangling

October 04, 2022 No comments

Hierarchical Indexing in Pandas Series

Create multi level indexing in pandas series

pd.Series(np.random.rand(9), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],)

Access the values

X['a'] -- using exact index
X[:, 1] -- using slice
X['a':'b', 1] -- using slice

Access Index

X.index -- return tuple

Changing multi level indexing to row/col

X.unstack() -- to row/col representation.
X.unstack(level = -1) -- first level index will be ROW and second level index will be column
X.unstack(level=0) - first level index will be Column and second level index will be row

Changing dataframe to single pandas series

X.stack() -- to multi level index, row will become 1st level index and column will become second level index.

Hierarchical indexing in Pandas Dataframe

Create multi level indexing for both row and columns

X=pd.Series(np.random.rand(9, 6), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],
columns=[["col1","col1","col2","col2","col3","col3"],['c1','c2', 'c3', 'c4', 'c5', 'c6']])

Access name of rows and columns

X.index.names
X.columns.name

Access row/column values

X.loc[["row1"],[["col1", "col2"]]]

Swap multilevel index

X1.swap_level(['Row", "Row2"]) -- Row1 and Row2 will interchange.

Sort index

X1.sort_index(level=0)
X1.sort_index(level=1)

Aggregation of data

X1.sum(level = 'Row1') -- Summation for Row1
X1.sum(level - 'Row2')
X1.sum(level = "ColName1", axis =1) -- Summation for Colname1
X1.sum(level = "ColName2", axis =1)

Changing column to index

X.set_index(['b', 'a'])
X.set_index(['b', 'a'], drop=False)

Change index to column

X.reset_index()

Merging data sources

If have common column - INNER JOIN

pd.merge(df1, df2 ) -- should have common column and other column are arranges as seperate column. (OR)
pd.merge(df1, df2, how="inner")

If does not have common column

pd.merge(df1, df2 , left_on="lkey")
pd.merge(df1, df2 , right_on="lkey1")

OUTER JOIN

pd.merge(df1, df2, how="outer") -- INNER JOIN + uncomon element.

LEFT/RIGHT Join

pd.merge(df1, df2, how="left") -- INNER JOIN + uncommon element of left dataframe
pd.merge(df1, df2, how="right") -- INNER JOIN + uncommon element of right dataframe.

It is possible to have more than one column as key.

pd.merge(df1, df2 , right_on=["lkey1", "lkey2"]) --

Concatenate

pd.concatenate([X, Y], axis=1) -- merged side by side
pd.concatenate([X,Y], axis =0) -- stacked top to bottom.
pd.concatenate([X,Y], ignore_index=True) - is stacked top to bottom ignoring matching of row index and pandas creates own index like 0,1,2 etc
pd.concatenate([X,Y], ignore_index=False) - existing index names of original dataframe is retained.

Combining two dataframe to fill missing values

X1.combine_first(X2) -- Value of X2 will be used to fill the NAN value in dataframe = X1.

Pivot and Melt function

X.pivot(index="c1", columns="c2", value = "col3") -- 'c1' becomes ROW, 'c2' becomes column and 'col3' becomes value across 'c1' and 'c2'
X.pivot(index="c1", columns="c2", value = ["col3","col4"]) -- 'col3' and 'col4' values are stacked side by side in the o/p

Melt function

pd.melt(X, ["col1"]) -- Three column comes -- "col1", variable(name of all other columns.), values (value of all other columns.)

Share:

0 comments:

Post a Comment