Sindbad~EG File Manager
�
Mٜg�� � � � d Z ddlmZ ddlmZmZmZmZ ddlZddl m
Z
mZ ddlZddl
mZ ddlmZmZmZmZmZmZmZ ddlZddlZddlmZ dd lmZmZ dd
lm Z ddl!m"c m#Z$ ddl%m&Z& ddl'm(Z(m)Z)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4 dd
l5m6Z7 ddl8m9Z9m:Z: ddl;m<Z<m=Z=m>Z>m?Z? ddl@mAZA ddlBmCZCmDZD ddlEmFZFmGZGmHZHmIZImJZJmKZKmLZLmMZMmNZNmOZOmPZP ddlQmRZRmSZSmTZT ddlUmVZVmWZW ddlXmYZY ddlZm[Z[ ddl\m]Z]m^Z^m_Z_m`Z`maZambZbmcZc ddldmeZe ddlfmgZgmhZh ddlimjZjmkZk ddllmmc mnZo ddlpmqZq ddlrmsZs ddltmuZumvZvmwZw ddlxmyZy ddlzm{Z{m|Z| dd l}m~Z~mZm�Z�m�Z�m�Z� dd!l�m�Z� dd"l�m�Z� dd#l�m�Z� dd$l�m�Z�m�Z� erdd%lm�Z� dd&l�m�Z� dd'l�m�Z�m�Z�m�Z� d(Z�d)d*d+d,�Z�d-Z�d.Z�d/Z�d0Z�d1Z�d2Z�e G d3� d4ej� � Z�eee�e eegef e�eegef eeef f Z� G d5� d6ejeke/ e{� Z� ed7es�8� Z� G d9� d:e�e/ � Z� e?e�� d> d?d;�� Z�d@d<�Z�d=Z�y)Aa
Provide the groupby split-apply-combine paradigm. Define the GroupBy
class providing the base-class of operations.
The SeriesGroupBy and DataFrameGroupBy sub-class
(defined in pandas.core.groupby.generic)
expose these user-facing objects to provide specific functionality.
� )�annotations)�Hashable�Iterator�Mapping�SequenceN)�partial�wraps)�dedent)�
TYPE_CHECKING�Callable�Literal�TypeVar�Union�cast�final)�option_context)� Timestamp�lib)�rank_1d)�NA)
�AnyArrayLike� ArrayLike�Axis�AxisInt�DtypeObj�
FillnaOptions�
IndexLabel�NDFrameT�PositionalIndexer�RandomState�Scalar�T�npt)�function)�AbstractMethodError� DataError)�Appender�Substitution�cache_readonly�doc)�find_stack_level)�coerce_indexer_dtype�ensure_dtype_can_hold_na)�
is_bool_dtype�is_float_dtype�is_hashable�
is_integer�is_integer_dtype�is_list_like�is_numeric_dtype�is_object_dtype� is_scalar�needs_i8_conversion�pandas_dtype)�isna�na_value_for_dtype�notna)�
algorithms�sample)�executor)�warn_alias_replacement)�ArrowExtensionArray�BaseMaskedArray�Categorical�ExtensionArray�
FloatingArray�IntegerArray�SparseArray)�StringDtype)�ArrowStringArray�ArrowStringArrayNumpySemantics)�PandasObject�SelectionMixin)� DataFrame)�NDFrame)�base�numba_�ops)�get_grouper)�GroupByIndexingMixin�GroupByNthSelector)�CategoricalIndex�Index�
MultiIndex�
RangeIndex�
default_index)�ensure_block_shape)�Series)�get_group_index_sorter)�get_jit_arguments�maybe_use_numba)�Any)� Resampler)�ExpandingGroupby�ExponentialMovingWindowGroupby�RollingGroupbyz�
See Also
--------
Series.%(name)s : Apply a function %(name)s to a Series.
DataFrame.%(name)s : Apply a function %(name)s
to each row or column of a DataFrame.
al
Apply function ``func`` group-wise and combine the results together.
The function passed to ``apply`` must take a {input} as its first
argument and return a DataFrame, Series or scalar. ``apply`` will
then take care of combining the results back together into a single
dataframe or series. ``apply`` is therefore a highly flexible
grouping method.
While ``apply`` is a very flexible method, its downside is that
using it can be quite a bit slower than using more specific methods
like ``agg`` or ``transform``. Pandas offers a wide range of method that will
be much faster than using ``apply`` for their specific purposes, so try to
use them before reaching for ``apply``.
Parameters
----------
func : callable
A callable that takes a {input} as its first argument, and
returns a dataframe, a series or a scalar. In addition the
callable may take positional and keyword arguments.
include_groups : bool, default True
When True, will attempt to apply ``func`` to the groupings in
the case that they are columns of the DataFrame. If this raises a
TypeError, the result will be computed with the groupings excluded.
When False, the groupings will be excluded when applying ``func``.
.. versionadded:: 2.2.0
.. deprecated:: 2.2.0
Setting include_groups to True is deprecated. Only the value
False will be allowed in a future version of pandas.
args, kwargs : tuple and dict
Optional positional and keyword arguments to pass to ``func``.
Returns
-------
Series or DataFrame
See Also
--------
pipe : Apply function to the full GroupBy object instead of to each
group.
aggregate : Apply aggregate function to the GroupBy object.
transform : Apply function column-by-column to the GroupBy object.
Series.apply : Apply a function to a Series.
DataFrame.apply : Apply a function to each row or column of a DataFrame.
Notes
-----
.. versionchanged:: 1.3.0
The resulting dtype will reflect the return value of the passed ``func``,
see the examples below.
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
Examples
--------
{examples}
a?
>>> df = pd.DataFrame({'A': 'a a b'.split(),
... 'B': [1, 2, 3],
... 'C': [4, 6, 5]})
>>> g1 = df.groupby('A', group_keys=False)
>>> g2 = df.groupby('A', group_keys=True)
Notice that ``g1`` and ``g2`` have two groups, ``a`` and ``b``, and only
differ in their ``group_keys`` argument. Calling `apply` in various ways,
we can get different grouping results:
Example 1: below the function passed to `apply` takes a DataFrame as
its argument and returns a DataFrame. `apply` combines the result for
each group together into a new DataFrame:
>>> g1[['B', 'C']].apply(lambda x: x / x.sum())
B C
0 0.333333 0.4
1 0.666667 0.6
2 1.000000 1.0
In the above, the groups are not part of the index. We can have them included
by using ``g2`` where ``group_keys=True``:
>>> g2[['B', 'C']].apply(lambda x: x / x.sum())
B C
A
a 0 0.333333 0.4
1 0.666667 0.6
b 2 1.000000 1.0
Example 2: The function passed to `apply` takes a DataFrame as
its argument and returns a Series. `apply` combines the result for
each group together into a new DataFrame.
.. versionchanged:: 1.3.0
The resulting dtype will reflect the return value of the passed ``func``.
>>> g1[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
B C
A
a 1.0 2.0
b 0.0 0.0
>>> g2[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
B C
A
a 1.0 2.0
b 0.0 0.0
The ``group_keys`` argument has no effect here because the result is not
like-indexed (i.e. :ref:`a transform <groupby.transform>`) when compared
to the input.
Example 3: The function passed to `apply` takes a DataFrame as
its argument and returns a scalar. `apply` combines the result for
each group together into a Series, including setting the index as
appropriate:
>>> g1.apply(lambda x: x.C.max() - x.B.min(), include_groups=False)
A
a 5
b 2
dtype: int64a�
>>> s = pd.Series([0, 1, 2], index='a a b'.split())
>>> g1 = s.groupby(s.index, group_keys=False)
>>> g2 = s.groupby(s.index, group_keys=True)
From ``s`` above we can see that ``g`` has two groups, ``a`` and ``b``.
Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only
differ in their ``group_keys`` argument. Calling `apply` in various ways,
we can get different grouping results:
Example 1: The function passed to `apply` takes a Series as
its argument and returns a Series. `apply` combines the result for
each group together into a new Series.
.. versionchanged:: 1.3.0
The resulting dtype will reflect the return value of the passed ``func``.
>>> g1.apply(lambda x: x * 2 if x.name == 'a' else x / 2)
a 0.0
a 2.0
b 1.0
dtype: float64
In the above, the groups are not part of the index. We can have them included
by using ``g2`` where ``group_keys=True``:
>>> g2.apply(lambda x: x * 2 if x.name == 'a' else x / 2)
a a 0.0
a 2.0
b b 1.0
dtype: float64
Example 2: The function passed to `apply` takes a Series as
its argument and returns a scalar. `apply` combines the result for
each group together into a Series, including setting the index as
appropriate:
>>> g1.apply(lambda x: x.max() - x.min())
a 1
b 0
dtype: int64
The ``group_keys`` argument has no effect here because the result is not
like-indexed (i.e. :ref:`a transform <groupby.transform>`) when compared
to the input.
>>> g2.apply(lambda x: x.max() - x.min())
a 1
b 0
dtype: int64)�template�dataframe_examples�series_examplesa
Compute {fname} of group values.
Parameters
----------
numeric_only : bool, default {no}
Include only float, int, boolean columns.
.. versionchanged:: 2.0.0
numeric_only no longer accepts ``None``.
min_count : int, default {mc}
The required number of valid values to perform the operation. If fewer
than ``min_count`` non-NA values are present the result will be NA.
Returns
-------
Series or DataFrame
Computed {fname} of values within each group.
Examples
--------
{example}
a:
Compute {fname} of group values.
Parameters
----------
numeric_only : bool, default {no}
Include only float, int, boolean columns.
.. versionchanged:: 2.0.0
numeric_only no longer accepts ``None``.
min_count : int, default {mc}
The required number of valid values to perform the operation. If fewer
than ``min_count`` non-NA values are present the result will be NA.
engine : str, default None {e}
* ``'cython'`` : Runs rolling apply through C-extensions from cython.
* ``'numba'`` : Runs rolling apply through JIT compiled code from numba.
Only available when ``raw`` is set to ``True``.
* ``None`` : Defaults to ``'cython'`` or globally setting ``compute.use_numba``
engine_kwargs : dict, default None {ek}
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
and ``parallel`` dictionary keys. The values must either be ``True`` or
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
``{{'nopython': True, 'nogil': False, 'parallel': False}}`` and will be
applied to both the ``func`` and the ``apply`` groupby aggregation.
Returns
-------
Series or DataFrame
Computed {fname} of values within each group.
Examples
--------
{example}
a�
Apply a ``func`` with arguments to this %(klass)s object and return its result.
Use `.pipe` when you want to improve readability by chaining together
functions that expect Series, DataFrames, GroupBy or Resampler objects.
Instead of writing
>>> h = lambda x, arg2, arg3: x + 1 - arg2 * arg3
>>> g = lambda x, arg1: x * 5 / arg1
>>> f = lambda x: x ** 4
>>> df = pd.DataFrame([["a", 4], ["b", 5]], columns=["group", "value"])
>>> h(g(f(df.groupby('group')), arg1=1), arg2=2, arg3=3) # doctest: +SKIP
You can write
>>> (df.groupby('group')
... .pipe(f)
... .pipe(g, arg1=1)
... .pipe(h, arg2=2, arg3=3)) # doctest: +SKIP
which is much more readable.
Parameters
----------
func : callable or tuple of (callable, str)
Function to apply to this %(klass)s object or, alternatively,
a `(callable, data_keyword)` tuple where `data_keyword` is a
string indicating the keyword of `callable` that expects the
%(klass)s object.
args : iterable, optional
Positional arguments passed into `func`.
kwargs : dict, optional
A dictionary of keyword arguments passed into `func`.
Returns
-------
the return type of `func`.
See Also
--------
Series.pipe : Apply a function with arguments to a series.
DataFrame.pipe: Apply a function with arguments to a dataframe.
apply : Apply function to each group instead of to the
full %(klass)s object.
Notes
-----
See more `here
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#piping-function-calls>`_
Examples
--------
%(examples)s
a�
Call function producing a same-indexed %(klass)s on each group.
Returns a %(klass)s having the same indexes as the original object
filled with the transformed values.
Parameters
----------
f : function, str
Function to apply to each group. See the Notes section below for requirements.
Accepted inputs are:
- String
- Python function
- Numba JIT function with ``engine='numba'`` specified.
Only passing a single function is supported with this engine.
If the ``'numba'`` engine is chosen, the function must be
a user defined function with ``values`` and ``index`` as the
first and second arguments respectively in the function signature.
Each group's index will be passed to the user defined function
and optionally available for use.
If a string is chosen, then it needs to be the name
of the groupby method you want to use.
*args
Positional arguments to pass to func.
engine : str, default None
* ``'cython'`` : Runs the function through C-extensions from cython.
* ``'numba'`` : Runs the function through JIT compiled code from numba.
* ``None`` : Defaults to ``'cython'`` or the global setting ``compute.use_numba``
engine_kwargs : dict, default None
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
and ``parallel`` dictionary keys. The values must either be ``True`` or
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
``{'nopython': True, 'nogil': False, 'parallel': False}`` and will be
applied to the function
**kwargs
Keyword arguments to be passed into func.
Returns
-------
%(klass)s
See Also
--------
%(klass)s.groupby.apply : Apply function ``func`` group-wise and combine
the results together.
%(klass)s.groupby.aggregate : Aggregate using one or more
operations over the specified axis.
%(klass)s.transform : Call ``func`` on self producing a %(klass)s with the
same axis shape as self.
Notes
-----
Each group is endowed the attribute 'name' in case you need to know
which group you are working on.
The current implementation imposes three requirements on f:
* f must return a value that either has the same shape as the input
subframe or can be broadcast to the shape of the input subframe.
For example, if `f` returns a scalar it will be broadcast to have the
same shape as the input subframe.
* if this is a DataFrame, f must support application column-by-column
in the subframe. If f also supports application to the entire subframe,
then a fast path is used starting from the second chunk.
* f must not mutate groups. Mutation is not supported and may
produce unexpected results. See :ref:`gotchas.udf-mutation` for more details.
When using ``engine='numba'``, there will be no "fall back" behavior internally.
The group data and group index will be passed as numpy arrays to the JITed
user defined function, and no alternative execution attempts will be tried.
.. versionchanged:: 1.3.0
The resulting dtype will reflect the return value of the passed ``func``,
see the examples below.
.. versionchanged:: 2.0.0
When using ``.transform`` on a grouped DataFrame and the transformation function
returns a DataFrame, pandas now aligns the result's index
with the input's index. You can call ``.to_numpy()`` on the
result of the transformation function to avoid alignment.
Examples
--------
%(example)saP
Aggregate using one or more operations over the specified axis.
Parameters
----------
func : function, str, list, dict or None
Function to use for aggregating the data. If a function, must either
work when passed a {klass} or when passed to {klass}.apply.
Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
- None, in which case ``**kwargs`` are used with Named Aggregation. Here the
output has one column for each element in ``**kwargs``. The name of the
column is keyword, whereas the value determines the aggregation used to compute
the values in the column.
Can also accept a Numba JIT function with
``engine='numba'`` specified. Only passing a single function is supported
with this engine.
If the ``'numba'`` engine is chosen, the function must be
a user defined function with ``values`` and ``index`` as the
first and second arguments respectively in the function signature.
Each group's index will be passed to the user defined function
and optionally available for use.
.. deprecated:: 2.1.0
Passing a dictionary is deprecated and will raise in a future version
of pandas. Pass a list of aggregations instead.
*args
Positional arguments to pass to func.
engine : str, default None
* ``'cython'`` : Runs the function through C-extensions from cython.
* ``'numba'`` : Runs the function through JIT compiled code from numba.
* ``None`` : Defaults to ``'cython'`` or globally setting ``compute.use_numba``
engine_kwargs : dict, default None
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
and ``parallel`` dictionary keys. The values must either be ``True`` or
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
``{{'nopython': True, 'nogil': False, 'parallel': False}}`` and will be
applied to the function
**kwargs
* If ``func`` is None, ``**kwargs`` are used to define the output names and
aggregations via Named Aggregation. See ``func`` entry.
* Otherwise, keyword arguments to be passed into func.
Returns
-------
{klass}
See Also
--------
{klass}.groupby.apply : Apply function func group-wise
and combine the results together.
{klass}.groupby.transform : Transforms the Series on each group
based on the given function.
{klass}.aggregate : Aggregate using one or more
operations over the specified axis.
Notes
-----
When using ``engine='numba'``, there will be no "fall back" behavior internally.
The group data and group index will be passed as numpy arrays to the JITed
user defined function, and no alternative execution attempts will be tried.
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
.. versionchanged:: 1.3.0
The resulting dtype will reflect the return value of the passed ``func``,
see the examples below.
{examples}a�
Aggregate using one or more operations over the specified axis.
Parameters
----------
func : function, str, list, dict or None
Function to use for aggregating the data. If a function, must either
work when passed a {klass} or when passed to {klass}.apply.
Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
- dict of axis labels -> functions, function names or list of such.
- None, in which case ``**kwargs`` are used with Named Aggregation. Here the
output has one column for each element in ``**kwargs``. The name of the
column is keyword, whereas the value determines the aggregation used to compute
the values in the column.
Can also accept a Numba JIT function with
``engine='numba'`` specified. Only passing a single function is supported
with this engine.
If the ``'numba'`` engine is chosen, the function must be
a user defined function with ``values`` and ``index`` as the
first and second arguments respectively in the function signature.
Each group's index will be passed to the user defined function
and optionally available for use.
*args
Positional arguments to pass to func.
engine : str, default None
* ``'cython'`` : Runs the function through C-extensions from cython.
* ``'numba'`` : Runs the function through JIT compiled code from numba.
* ``None`` : Defaults to ``'cython'`` or globally setting ``compute.use_numba``
engine_kwargs : dict, default None
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
and ``parallel`` dictionary keys. The values must either be ``True`` or
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
``{{'nopython': True, 'nogil': False, 'parallel': False}}`` and will be
applied to the function
**kwargs
* If ``func`` is None, ``**kwargs`` are used to define the output names and
aggregations via Named Aggregation. See ``func`` entry.
* Otherwise, keyword arguments to be passed into func.
Returns
-------
{klass}
See Also
--------
{klass}.groupby.apply : Apply function func group-wise
and combine the results together.
{klass}.groupby.transform : Transforms the Series on each group
based on the given function.
{klass}.aggregate : Aggregate using one or more
operations over the specified axis.
Notes
-----
When using ``engine='numba'``, there will be no "fall back" behavior internally.
The group data and group index will be passed as numpy arrays to the JITed
user defined function, and no alternative execution attempts will be tried.
Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.
.. versionchanged:: 1.3.0
The resulting dtype will reflect the return value of the passed ``func``,
see the examples below.
{examples}c �&