DataFrame.aggregate
- DataFrame.aggregate(func: str | List[str], axis: int = 0, numeric_only: bool | None = None, *args, **kwargs) Series | DataFrame [source]
Aggregate using one or more operations over the specified axis.
Parameters
- func: function, str, list or dict
Function to use for aggregating the data. If a function, must either work when passed a %(klass)s or when passed to %(klass)s.apply.
Accepted combinations are:
function
string function name
list of functions and/or function names, e.g.
[np.sum, 'mean']
dict of axis labels -> functions, function names or list of such.
Currently, we only support
['count', 'mad', 'max', 'mean', 'median', 'min', 'mode', 'quantile', 'rank', 'sem', 'skew', 'sum', 'std', 'var']
- axis: int
Currently, we only support axis=0 (index)
- numeric_only: {True, False, None} Default is None
Which datatype to be returned - True: returns all values with float64, NaN/NaT are ignored. - False: returns all values with float64. - None: returns all values with default datatype.
- *args
Positional arguments to pass to func
- **kwargs
Keyword arguments to pass to func
Returns
- DataFrame, Series or scalar
if DataFrame.agg is called with a single function, returns a Series if DataFrame.agg is called with several functions, returns a DataFrame if Series.agg is called with single function, returns a scalar if Series.agg is called with several functions, returns a Series
See Also
Examples
>>> from tests import OPENSEARCH_TEST_CLIENT
>>> df = oml.DataFrame(OPENSEARCH_TEST_CLIENT, 'flights', columns=['AvgTicketPrice', 'DistanceKilometers', 'timestamp', 'DestCountry']) >>> df.aggregate(['sum', 'min', 'std'], numeric_only=True).astype(int) AvgTicketPrice DistanceKilometers sum 8204364 92616288 min 100 0 std 266 4578
>>> df.aggregate(['sum', 'min', 'std'], numeric_only=True) AvgTicketPrice DistanceKilometers sum 8.204365e+06 9.261629e+07 min 1.000205e+02 0.000000e+00 std 2.664071e+02 4.578614e+03
>>> df.aggregate(['sum', 'min', 'std'], numeric_only=False) AvgTicketPrice DistanceKilometers timestamp DestCountry sum 8.204365e+06 9.261629e+07 NaT NaN min 1.000205e+02 0.000000e+00 2018-01-01 NaN std 2.664071e+02 4.578614e+03 NaT NaN
>>> df.aggregate(['sum', 'min', 'std'], numeric_only=None) AvgTicketPrice DistanceKilometers timestamp DestCountry sum 8.204365e+06 9.261629e+07 NaT NaN min 1.000205e+02 0.000000e+00 2018-01-01 NaN std 2.664071e+02 4.578614e+03 NaT NaN