Split, Apply, Combine
Mean
- X["Values1"].groupby([X["Keys1"]]).mean() -- Mean of "Values1" based on Keys = "Key1"
- X["Values1"].groupby([X["Keys1"],X["Keys2"]]).mean()
- X.groupBy["Keys1"].mean()
- X.groupBy(X["Keys1"], X["Keys2"]).mean()
Count
- X.groupBy(X["Keys1"], X["Keys2"]).size()
GroupBy Clause with FOR Loop
- for name, group in X.groupby([["Key1"]])
- print (name)
- print(group)
Accessing each group in dictionary key
- dict(list(X.groupby([["Key1"]])))
Extracting based on column
- X.groupby(X.dtypes, axis =1) -- column will be seperated based on data types.
Column Grouping
- Create a mapping from column to grouping
- mapColumns={'a':'Group1', 'b':'Group1', 'c':'Group2', 'd':'Group2', 'e':'Group3', 'f':'Group3'}
- X.groupby(mapColumns, axis=1).mean()
Passing Lambda function in groupby
- X.groupby(lambda x:x<'c', axis =1).mean() -- Two group will be created with TRUE/FALSE column.
Aggregate function
- def max_min(x)
- return x.max() - x.min()
- X.agg([max, min, max_min]) -- For each column min, max, max_min function will be called.
Aggregate function with groupby
- X1 = X.groupby(["key1"]) -- Will create dataframe for each groups in "key1"
- X1.agg(min, max, max_min)) -- Will apply aggregate function for each of that column.
Aggregate function with custom defined column name
- X1.agg(("Maximum", max),("Minimum", min),("Maximum_Minimum",max_min))
GroupBy with Apply
- X.groupby().apply(fName)
- X.groupby(["key1"]).apply(lambda x: x.min()) -- Passing lambda function, User defined function can also be applied.
GroupBy with Apply with passing arguments to function
- def minimumFb(X, x):
- return(X.max()-X.min() > x)
- X.groupby(["key1"]).apply(minimumFb(10))
Apply function with bucket analysis
- quartiles = X.cut(X.Values1, 2)
- X.groupby(quartiles).apply(lambda x:x.min())
Pivot Table
- X.pivot_table(values="AvgSizeOfTrip" , index = "Gender" , columns="Group" )
Pivot table with aggregation
- X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc=np.sum)
- X.pivot_table (values="AvgSizeOfTrip" , index = "Gender" , columns="Group", aggfunc={"Col1":np.sum, 'Col2':np.mean}) - different column with different operation.
Pivot table with count
- pd.crosstab(X.gender, X.group) -- This just counts the frequency.
0 comments:
Post a Comment