DataFrame.groupby

DataFrame.groupby(by: str | List[str] | None = None, dropna: bool = True) DataFrameGroupBy[source]

Used to perform groupby operations

Parameters

by:

column or list of columns used to groupby Currently accepts column or list of columns

dropna: default True

If True, and if group keys contain NA values, NA values together with row/column will be dropped.

Returns

opensearch_py_ml.groupby.DataFrameGroupBy

See Also

:pandas_api_docs:`pandas.DataFrame.groupby`

Examples

>>> from tests import OPENSEARCH_TEST_CLIENT
>>> oml_flights = oml.DataFrame(OPENSEARCH_TEST_CLIENT, 'flights', columns=["AvgTicketPrice", "Cancelled", "dayOfWeek", "timestamp", "DestCountry"])
>>> oml_flights.groupby(["DestCountry", "Cancelled"]).agg(["min", "max"], numeric_only=True) 
                      AvgTicketPrice              dayOfWeek
                                 min          max       min  max
DestCountry Cancelled
AE          False         110.799911  1126.148682       0.0  6.0
            True          132.443756   817.931030       0.0  6.0
AR          False         125.589394  1199.642822       0.0  6.0
            True          251.389603  1172.382568       0.0  6.0
AT          False         100.020531  1181.835815       0.0  6.0
...                              ...          ...       ...  ...
TR          True          307.915649   307.915649       0.0  0.0
US          False         100.145966  1199.729004       0.0  6.0
            True          102.153069  1192.429932       0.0  6.0
ZA          False         102.002663  1196.186157       0.0  6.0
            True          121.280296  1175.709961       0.0  6.0

[63 rows x 4 columns]
>>> oml_flights.groupby(["DestCountry", "Cancelled"]).mean(numeric_only=True) 
                       AvgTicketPrice  dayOfWeek
DestCountry Cancelled
AE          False          643.956793   2.717949
            True           388.828809   2.571429
AR          False          673.551677   2.746154
            True           682.197241   2.733333
AT          False          647.158290   2.819936
...                               ...        ...
TR          True           307.915649   0.000000
US          False          598.063146   2.752014
            True           579.799066   2.767068
ZA          False          636.998605   2.738589
            True           677.794078   2.928571

[63 rows x 2 columns]
>>> oml_flights.groupby(["DestCountry", "Cancelled"]).min(numeric_only=False) 
                       AvgTicketPrice  dayOfWeek           timestamp
DestCountry Cancelled
AE          False          110.799911          0 2018-01-01 19:31:30
            True           132.443756          0 2018-01-06 13:03:25
AR          False          125.589394          0 2018-01-01 01:30:47
            True           251.389603          0 2018-01-01 02:13:17
AT          False          100.020531          0 2018-01-01 05:24:19
...                               ...        ...                 ...
TR          True           307.915649          0 2018-01-08 04:35:10
US          False          100.145966          0 2018-01-01 00:06:27
            True           102.153069          0 2018-01-01 09:02:36
ZA          False          102.002663          0 2018-01-01 06:44:44
            True           121.280296          0 2018-01-04 00:37:01

[63 rows x 3 columns]