Join my course at Udemy (Python Programming Bible-From beginner to advanced )

Blogger templates

Tuesday, 4 October 2022

6. Data Analysis - Data Wrangling

 Hierarchical Indexing in Pandas Series

  • Create multi level indexing in pandas series
    • pd.Series(np.random.rand(9), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],)
  • Access the values
    • X['a']  -- using exact index
    • X[:, 1]  -- using slice
    • X['a':'b', 1] -- using slice
  • Access Index
    • X.index  -- return tuple
  • Changing multi level indexing to row/col 
    • X.unstack()   -- to row/col representation.
    • X.unstack(level = -1) -- first level index will be ROW and second level index will be column
    • X.unstack(level=0) first level index will be Column and second level index will be row
  • Changing dataframe to single pandas series
    • X.stack() -- to multi level index, row will become 1st level index and column will become second level index.
Hierarchical indexing in Pandas Dataframe
  • Create multi level indexing for both row and columns
    • X=pd.Series(np.random.rand(9, 6), index=[['a','b','c','d','a','b','c','d','a'],[1,2,4,4,5,2,7,8,8]],
    •                                                 columns=[["col1","col1","col2","col2","col3","col3"],['c1','c2', 'c3', 'c4', 'c5', 'c6']])
  • Access name of rows and columns
    • X.index.names
    • X.columns.name
  • Access row/column values
    • X.loc[["row1"],[["col1", "col2"]]]

Swap multilevel index

  • X1.swap_level(['Row", "Row2"])  -- Row1 and Row2 will interchange.

Sort index

  • X1.sort_index(level=0)
  • X1.sort_index(level=1)

Aggregation of data

  • X1.sum(level = 'Row1')  -- Summation for Row1
  • X1.sum(level - 'Row2')
  • X1.sum(level = "ColName1", axis =1) -- Summation for Colname1
  • X1.sum(level = "ColName2", axis =1)

Changing column to index

  • X.set_index(['b', 'a']) 
  • X.set_index(['b', 'a'], drop=False)

Change index to column

  • X.reset_index()

Merging data sources

If have common column - INNER JOIN

  • pd.merge(df1, df2 ) -- should have common column and other column are arranges as seperate column. (OR)
  • pd.merge(df1, df2, how="inner")
If does not have common column
  • pd.merge(df1, df2 , left_on="lkey") 
  • pd.merge(df1, df2 ,  right_on="lkey1") 

OUTER JOIN

  • pd.merge(df1, df2, how="outer")  -- INNER JOIN + uncomon element.

LEFT/RIGHT Join

  • pd.merge(df1, df2, how="left")  -- INNER JOIN + uncommon element of left dataframe
  • pd.merge(df1, df2, how="right")  -- INNER JOIN + uncommon element of right dataframe.

It is possible to have more than one column as key.

  • pd.merge(df1, df2 ,  right_on=["lkey1", "lkey2"]) -- 

Concatenate

  • pd.concatenate([X, Y], axis=1) -- merged side by side
  • pd.concatenate([X,Y], axis =0) -- stacked top to bottom.
  • pd.concatenate([X,Y], ignore_index=True) - is stacked top to bottom ignoring matching of row index and pandas creates own index like 0,1,2 etc
  • pd.concatenate([X,Y], ignore_index=False) - existing index names of original dataframe is retained.

Combining two dataframe to fill missing values

  • X1.combine_first(X2) -- Value of X2 will be used to fill the NAN value in dataframe = X1.

Pivot and Melt function

  • X.pivot(index="c1", columns="c2", value = "col3") -- 'c1' becomes ROW, 'c2' becomes column and 'col3' becomes value across 'c1' and 'c2'
  • X.pivot(index="c1", columns="c2", value = ["col3","col4"])  -- 'col3' and 'col4' values are stacked side by side in the o/p

Melt function

  • pd.melt(X, ["col1"]) -- Three column comes -- "col1", variable(name of all other columns.), values (value of all other columns.)
Share:

0 comments:

Post a Comment

Feature Top (Full Width)

Pageviews

Search This Blog