Join my course at Udemy (Python Programming Bible-From beginner to advanced )

Blogger templates

Wednesday, 28 September 2022

3. Data Analysis - Pandas Dataframe

 Pandas Dataframe creation

Dataframe creation using dictionary ( with only column values)

  • data1 = {State:["Karnataka", "Jharkhand"], Year:["2021", "2022"], Name:['ABC', 'DEF]}
  • X= pd.DataFrame(data1)  ------- dataframe creation with all features
  • X= pd.DataFrame(data1, columns=["State", "Year"]) ---- dataframe creation with 2 features
  • X= pd.DataFrame(data1, columns=["State", "Year", "JUNK"]) -- Creating dataframe with invalid column. 

Dataframe creation using dictionary with column/index values

  •    data1 = {State:['one':"Karnataka", 'two':"Jharkhand"], Year:['one':"2021", 'two':"2022"], Name:['one':'ABC', 'two':'DEF]}
  • X=pd.DataFrame(data1)
  • X=pd.DataFrame(data1, columns=["State", "Year"])

By passing row index separately

  • data1 = {State:["Karnataka", "Jharkhand"], Year:["2021", "2022"], Name:['ABC', 'DEF]}
  • rowIndex=['one', 'two']
  • X=pd.DataFrame(data1, columns=['State', 'Year'], index=rowIndex)

Index and Column Update

  • Access row/columns
    • X.columns
    • X.index
  • Accessing Column Names
    • colName = X.columns
  • Access one particular column
    • X["State"]
    • X.State
  • Assign same value to all column
    • X['State"] = "ABC"
  • Assign different value to all column
    • X["State"] = ["AP", "HP", "KA", "TN"]
  • Add a column
    • X['newColumn'] = X['State']>"AP" -- Boolean array will be returned.
  • Delete a column
    • del X["newColumn"]
  • Check if row/column is present in dataframe
    • "newColumn" in X.columns
    • "one" in X.index

Index Object are immutable 

Row reindexing - Use reindex to Change order of row

  • X.index[0] = 10 --- Error
  • X1 = X.reindex([10,20,30,0,1,1,3])  --- Row reindexing . Note that new dataframe object is created.

Column reindexing - Use reindex to Change order of columns

    • X1 = X.reindex(columns=["Country", "State"]) -- column reindexing

    Index value can be repeated

    • rowIndex=[0,1,1,1,3]

    Transpose

    • X.T

    Change the Heading of Columns 

    • X.columns.name = "ColumnName"
    • X.index.name = "IndexName"

    Element access from dataframe

    Pandas Series Object

    • X["row"]
    • X[0]
    • X[["row1","row3"]]
    • X[[0,2]]
    • X[X>50]
    • X["row1":"row3"]
    • X[0:2]

    Pandas Dataframe

    • X["col1"] -- will extract a column
    • X["col1":"col2"] - will extract a column
    • X[1:3] - row 1 to row=2 will be extracted --- CONFUSION ???

    Loc/At syntax

    • X.loc["row1"]
    • X.iloc[0]
    • X.loc["row1", ["col1", "col3"]]
    • X.iloc[0, [0,2]]
    • X[X>5]
    • X.loc[:"row3]
    • X.iloc[:3
    • X.loc[:"row3", "col1":]
    • X.iloc[:3, 2:]
    • X.at["row3", "col2"]   ---- single value is extracted through AT command
    • X.iat[3,2]

    Hierarchial indexing

    • X.iloc[:3, :2][:, [0,1]]  -- accessing is stages.

    Arithematic operation

    • X - X1frame ---- row wise subtraction
    • X1.sub(X1frame)  --- same as X1-X1frame
    • X1.rsub(X1frame, axis = "index") ---- same as X1frame - X
    • X1.add(X1frame, fill_value = 99) -- for NAN, it will use 99.

    Apply

    • X1.apply(lambda x: x.max()) -- will return max value from each column
    • X1.apply(lambda x:x.max(), axis="columns")  -- max value for each row is calculated.

    How to get min/max for each row

    • def minmax(x):
      • return pd.Series([x.min(), x.max()], index=["min", "max"])
    • X1.apply(minmax, axis = "columns")  -- will return min/max for each row.

    Applymap

    • X.applymap(lambda x: x*10)  -- applymap works for each element of array and in this case multiply each element by 10.

    Sorting

    • X.sort_index() -- sorting of row index
    • X.sort_index(axis =1 ) -- sorting of column index 
    • X.sort_values() -- will sort based on values. -- for a series object
    • X.sort_values(by="col2") -- will be sorted based on col='col2' -- for dataframe object.
    • X.sort_values(by=["col2", "col1"])

    Rank

    • X.rank()  -- column wise ranking
    • X.rank(ascending=true)
    • X.rank(axis=1) --row wise ranking.

    Summarizing data

    • X.sum()
    • X.sum(axis=1)
    • X.describe()
    • X.corr()
    • X["col1"].corr(X["corr2"])
    • X.cov()
    • X["col1"].cov(X["corr2"])
    • Others ( sum, min,max, quantile,mean, median, kurt, skew, cumsum, cummax, cummin etc)


    Share:

    0 comments:

    Post a Comment

    Feature Top (Full Width)

    Pageviews

    Search This Blog