Sindbad~EG File Manager

Current Path : /usr/local/lib/python3.12/site-packages/pandas/io/__pycache__/
Upload File :
Current File : //usr/local/lib/python3.12/site-packages/pandas/io/__pycache__/stata.cpython-312.pyc

�

Mٜgn���UdZddlmZddlmZddlmZmZddlmZddl	Z	ddl
Z
ddlZddlm
Z
mZmZmZmZmZddlZddlZddlmZdd	lmZdd
lmZddlmZmZmZm Z ddl!m"Z"m#Z#dd
l$m%Z%ddl&m'Z'ddl(m)Z)m*Z*m+Z+ddl,m-Z-ddl.m/Z/m0Z0m1Z1m2Z2m3Z3m4Z4m5Z5ddl6m7Z7ddl8m9Z9ddl:m;Z;ddl<m=Z=ddl>m?Z?ddl@mAZAer$ddlBmCZCmDZDddlEmFZFddlmGZGddlHmIZImJZJmKZKmLZLmMZMmNZNdZOdZPdZQdZRd ZSd!ZTd"eP�d#eQ�d#eR�d#eS�d#e?d$d%z�d#e?d&�d'eT�d(�ZUd)eP�d#eQ�d*�ZVd+eP�d#eQ�d#eR�d#e?d$�d#e?d&�d,eT�d#�
ZWgd-�ZXed.d/d/�ZYd0eZd1<dad2�Z[dad3�Z\d4Z]d0eZd5<d6Z^d0eZd7<d8Z_d0eZd9<d:Z`d0eZd;<d<Zad0eZd=<dbd>�ZbGd?�d@�ZcGdA�dBec�ZdGdC�dD�ZeGdE�dF�ZfGdG�dHefej��Zhe"eU�dIdIddJdIddIddJdKddL�																									dcdM��ZidddN�ZjdedO�ZkdfdP�ZldgdQ�ZmdhdR�Zn	di							djdS�Zoe#e?d&e?dTdUz�V�GdW�dXef��ZpdkdY�ZqdldZ�ZrGd[�d\�ZsGd]�d^ep�ZtGd_�d`et�Zuy)ma�
Module contains tools for processing Stata files into DataFrames

The StataReader below was originally written by Joe Presbrey as part of PyDTA.
It has been extended and improved by Skipper Seabold from the Statsmodels
project who also developed the StataWriter and was finally added to pandas in
a once again improved version.

You can find more information on http://presbrey.mit.edu/PyDTA and
https://www.statsmodels.org/devel/
�)�annotations)�abc)�datetime�	timedelta)�BytesION)�IO�
TYPE_CHECKING�AnyStr�Callable�Final�cast)�lib)�infer_dtype)�max_len_string_array)�CategoricalConversionWarning�InvalidColumnName�PossiblePrecisionLoss�ValueLabelTypeMismatch)�Appender�doc)�find_stack_level)�ExtensionDtype)�
ensure_object�is_numeric_dtype�is_string_dtype)�CategoricalDtype)�Categorical�
DatetimeIndex�NaT�	Timestamp�isna�to_datetime�to_timedelta)�	DataFrame)�Index)�
RangeIndex)�Series)�_shared_docs)�
get_handle)�Hashable�Sequence)�
TracebackType)�Literal)�CompressionOptions�FilePath�
ReadBuffer�Self�StorageOptions�WriteBufferz�Version of given Stata file is {version}. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).z�convert_dates : bool, default True
    Convert date variables to DataFrame time values.
convert_categoricals : bool, default True
    Read value labels and convert columns to Categorical/Factor variables.aindex_col : str, optional
    Column to set as index.
convert_missing : bool, default False
    Flag indicating whether to convert missing values to their Stata
    representations.  If False, missing values are replaced with nan.
    If True, columns containing missing values are returned with
    object data types and missing values are represented by
    StataMissingValue objects.
preserve_dtypes : bool, default True
    Preserve Stata datatypes. If False, numeric data are upcast to pandas
    default types for foreign data (float64 or int64).
columns : list or None
    Columns to retain.  Columns will be returned in the given order.  None
    returns all columns.
order_categoricals : bool, default True
    Flag indicating whether converted categorical data are ordered.zzchunksize : int, default None
    Return StataReader object for iterations, returns chunks with
    given number of lines.z=iterator : bool, default False
    Return StataReader object.z�Notes
-----
Categorical variables read through an iterator may not have the same
categories and dtype. This occurs when  a variable stored in a DTA
file is associated to an incomplete set of value labels that only
label a strict subset of the values.a>
Read Stata file into DataFrame.

Parameters
----------
filepath_or_buffer : str, path object or file-like object
    Any valid string path is acceptable. The string could be a URL. Valid
    URL schemes include http, ftp, s3, and file. For file URLs, a host is
    expected. A local file could be: ``file://localhost/path/to/table.dta``.

    If you want to pass in a path object, pandas accepts any ``os.PathLike``.

    By file-like object, we refer to objects with a ``read()`` method,
    such as a file handle (e.g. via builtin ``open`` function)
    or ``StringIO``.
�
�decompression_options�filepath_or_buffer�storage_optionsz�

Returns
-------
DataFrame or pandas.api.typing.StataReader

See Also
--------
io.stata.StataReader : Low-level reader for Stata data files.
DataFrame.to_stata: Export Stata data files.

a

Examples
--------

Creating a dummy stata for this example

>>> df = pd.DataFrame({'animal': ['falcon', 'parrot', 'falcon', 'parrot'],
...                     'speed': [350, 18, 361, 15]})  # doctest: +SKIP
>>> df.to_stata('animals.dta')  # doctest: +SKIP

Read a Stata dta file:

>>> df = pd.read_stata('animals.dta')  # doctest: +SKIP

Read a Stata dta file in 10,000 line chunks:

>>> values = np.random.randint(0, 10, size=(20_000, 1), dtype="uint8")  # doctest: +SKIP
>>> df = pd.DataFrame(values, columns=["i"])  # doctest: +SKIP
>>> df.to_stata('filename.dta')  # doctest: +SKIP

>>> with pd.read_stata('filename.dta', chunksize=10000) as itr: # doctest: +SKIP
>>>     for chunk in itr:
...         # Operate on a single chunk, e.g., chunk.mean()
...         pass  # doctest: +SKIP
z�Reads observations from Stata file, converting them into a dataframe

Parameters
----------
nrows : int
    Number of lines to read from data file, if None read whole file.
z

Returns
-------
DataFrame
z�Class for reading Stata dta files.

Parameters
----------
path_or_buf : path (string), buffer or path object
    string, path object (pathlib.Path or py._path.local.LocalPath) or object
    implementing a binary read() functions.
z

)	�%tc�%tC�%td�%d�%tw�%tm�%tq�%th�%ty��r�stata_epochc��������tjjtjjc��tjt	ddd�z
j
�tjt	ddd�z
j
��dzdzdz��dzdzdz�d"��fd�}d"��fd�}d"����fd�}t
j|�}d	}|j�rd
}d|j|<|jtj�}|jd�rt}|}|||d
�}	�n�|jd�r=tjdt!���t#|t$��}	|r	t&|	|<|	S|jd�rt}|}
|||
d�}	�n+|jd�r(tj|dzz}|dzdz}
|||
�}	n�|jd�r(tj|dzz}|dzdz}|||�}	n�|jd�r+tj|dzz}|dzdzdz}
|||
�}	n}|jd�r+tj|dzz}|dzdzdz}|||�}	nA|jd�r!|}t
j(|�}|||�}	nt+d |�d!���|r	t&|	|<|	S)#a
    Convert from SIF to datetime. https://www.stata.com/help.cgi?datetime

    Parameters
    ----------
    dates : Series
        The Stata Internal Format date to convert to datetime according to fmt
    fmt : str
        The format to convert to. Can be, tc, td, tw, tm, tq, th, ty
        Returns

    Returns
    -------
    converted : Series
        The converted dates

    Examples
    --------
    >>> dates = pd.Series([52])
    >>> _stata_elapsed_date_to_datetime_vec(dates , "%tw")
    0   1961-01-01
    dtype: datetime64[ns]

    Notes
    -----
    datetime/c - tc
        milliseconds since 01jan1960 00:00:00.000, assuming 86,400 s/day
    datetime/C - tC - NOT IMPLEMENTED
        milliseconds since 01jan1960 00:00:00.000, adjusted for leap seconds
    date - td
        days since 01jan1960 (01jan1960 = 0)
    weekly date - tw
        weeks since 1960w1
        This assumes 52 weeks in a year, then adds 7 * remainder of the weeks.
        The datetime value is the start of the week in terms of days in the
        year, not ISO calendar weeks.
    monthly date - tm
        months since 1960m1
    quarterly date - tq
        quarters since 1960q1
    half-yearly date - th
        half-years since 1960h1 yearly
    date - ty
        years since 0000
    rArB�i��c���|j��kr&|j��kDrtd|z|zd��St|dd�}t	t||�D��cgc]\}}t
||d���c}}|��Scc}}w)z�
        Convert year and month to datetimes, using pandas vectorized versions
        when the date range falls within the range supported by pandas.
        Otherwise it falls back to a slower but more robust method
        using datetime.
        �dz%Y%m��format�indexNrB�rK)�max�minr"�getattrr'�zipr)�year�monthrK�y�m�MAX_YEAR�MIN_YEARs     ���:/usr/local/lib/python3.12/site-packages/pandas/io/stata.py�convert_year_month_safezD_stata_elapsed_date_to_datetime_vec.<locals>.convert_year_month_safe!sz����8�8�:�� �T�X�X�Z�(�%:��s�T�z�E�1�&�A�A��D�'�4�0�E���T�5�9I�J�9I���A�8�A�q�!�,�9I�J�RW�X�X��Js�A=
c
�J��|j��dz
kr.|j��kDrt|d��t|d��zSt	|dd�}t||�D��cgc](\}}t
|dd�tt|���z��*}}}t||�	�Scc}}w)
z{
        Converts year (e.g. 1999) and days since the start of the year to a
        datetime or datetime64 Series
        rB�%YrI�d��unitrKN��daysrL)
rMrNr"r#rOrPrr�intr')rQr_rKrSr[�valuerUrVs      ��rW�convert_year_days_safezC_stata_elapsed_date_to_datetime_vec.<locals>.convert_year_days_safe.s����
�8�8�:��A��&�4�8�8�:��+@��t�D�1�L��C�4P�P�P��D�'�4�0�E�GJ�4�QU���GV�t�q�!���A�q�!�I�3�q�6�$:�:��
���%�u�-�-��s�"-Bc	���t|dd�}|dk(rX|j��kDs|j��kr�|D�cgc]}|tt	|���z��}}t||��S|dk(r[|j��kDs|j��	kr@|D�cgc]}|tt	|�dz��z��}}t||��St
d	��t|�}t||�
�}||zScc}wcc}w)z�
        Convert base dates and deltas to datetimes, using pandas vectorized
        versions if the deltas satisfy restrictions required to be expressed
        as dates in pandas.
        rKNr[r^rL�msrF)�microsecondszformat not understoodr\)	rOrMrNrr`r'�
ValueErrorr"r#)
�base�deltasr]rKr[�values�
MAX_DAY_DELTA�MAX_MS_DELTA�
MIN_DAY_DELTA�MIN_MS_DELTAs
      ����rW�convert_delta_safez?_stata_elapsed_date_to_datetime_vec.<locals>.convert_delta_safe<s��������.���3�;��z�z�|�m�+�v�z�z�|�m�/K�AG�H��A�$���A��!7�7���H��f�E�2�2�
�T�\��z�z�|�l�*�f�j�j�l�\�.I�LR��LR�q�D�9�3�q�6�D�=�B�B�F����f�E�2�2��4�5�5��4� ���f�4�0���f�}���I��s�C4�"C9FTg�?�r8�tcrd�r9�tCz9Encountered %tC format. Leaving in Stata Internal Format.��
stacklevel��dtype)r:�tdr;r[r[�r<�tw�4��r=�tm��r>�tq���r?�th���r@�tyz	Date fmt � not understood)�returnr')r rNrQrMrr_�np�isnan�any�_values�astype�int64�
startswithrC�warnings�warnrr'�objectr�	ones_likerf)�dates�fmtrXrbrn�bad_locs�has_bad_valuesrgrd�
conv_datesr_rQrR�
quarter_month�first_monthrjrkrUrlrmrVs               @@@@@@rW�#_stata_elapsed_date_to_datetime_vecr��s����\#���+�+�Y�]�]�-?�-?��H�h��]�]�X�d�A�q�%9�9�?�?�M��]�]�X�d�A�q�%9�9�?�?�M� �2�%��,�t�3�L� �2�%��,�t�3�L�Y�.���2�x�x���H��N��|�|�~���"%��
�
�h���L�L����"�E�
�~�~�m�$���
��'��b�$�7�
�	���
�	&��
�
�G�'�)�	
��E��0�
��#&�J�x� ���	���0�	1�����'��d�C�8�
�

���
�	&����%�2�+�-����
�a���+�D�$�7�
�	���
�	&����%�2�+�-�����q� ��,�T�5�9�
�	���
�	&����%�1�*�,�����a��!�+�
�,�T�=�A�
�	���
�	&����%�1�*�,�����a��!�#��,�T�5�9�
�	���
�	&����l�l�5�)��,�T�;�?�
��9�S�E��9�:�:��"�
�8����c����	�|j�	d��dz�	d#							d$���	fd�
}t|�}|j�	|j�rPtj|j
d�rt
t�|j|<nt|j|<|dvr||d��}|jdz}�n�|d	vr#tjd
t���|}�n\|dvr||d��}|j�z}�n=|d
vr<||dd��}d|jtjz
z|jdzz}n�|dvr;||d��}d|jtjz
z|jzdz
}n�|dvr>||d��}d|jtjz
z|jdz
dzz}n||dvrN||d��}d|jtjz
z|jdkDj!t"�z}n*|dvr||d��}|j}nt%d|�d���t'|t(j*d��}t-j.dd �d!}|||<t'|�	d�"�S)%aO
    Convert from datetime to SIF. https://www.stata.com/help.cgi?datetime

    Parameters
    ----------
    dates : Series
        Series or array containing datetime or datetime64[ns] to
        convert to the Stata Internal Format given by fmt
    fmt : str
        The format to convert to. Can be, tc, td, tw, tm, tq, th, ty
    l�"R:rFFc����i}tj|jd��r|rP|tt�jd�z
}|jjtj�dz|d<|s|r=t|�}|jj|d<|jj|d<|�r%|jjtj�t|dd��jjtj�z
}|�z|d	<n�t|d
��dk(r�|r9|jtz
}d�
fd
�}tj |�}	|	|�|d<|r<|j#d��}
|
jdz|d<|
j|ddzz
|d<|r0dd�}tj |�}	|	|�|d	<nt%d��t'|���S)N�M�nsrF�deltarQrRrZrIr_F��skipnarc�\���|jzd|jzz|jzS)Ni@B)r_�secondsre)�x�
US_PER_DAYs �rW�fzC_datetime_to_stata_elapsed_vec.<locals>.parse_dates_safe.<locals>.f�s)���%����.��1�9�9�1D�D�q�~�~�U�Ur�c�:�d|jz|jzS)NrH)rQrR�r�s rW�<lambda>zJ_datetime_to_stata_elapsed_vec.<locals>.parse_dates_safe.<locals>.<lambda>�s��3����<�!�'�'�3Ir�rHc�J�|t|jdd�z
jS)NrB)rrQr_r�s rW�gzC_datetime_to_stata_elapsed_vec.<locals>.parse_dates_safe.<locals>.g�s ��������A� 6�6�<�<�<r�zQColumns containing dates must contain either datetime64, datetime or null values.rL)r�rr��float)r�rr�r`)r�is_np_dtypervr rC�as_unitr��viewr�r�r�_datarQrRr"r�	vectorize�applyrfr$)r�r�rQr_r[�
time_delta�
date_index�
days_in_nsr��v�
year_monthr��
NS_PER_DAYr�rKs            ���rW�parse_dates_safez8_datetime_to_stata_elapsed_vec.<locals>.parse_dates_safe�s����
���?�?�5�;�;��,��"�Y�{�%;�%C�%C�D�%I�I�
�'�/�/�4�4�R�X�X�>�$�F��'�
��t�*�5�1�
�&�,�,�1�1��&�	�'�-�-�3�3��'�
��"�]�]�/�/����9�K��f�I�d�=��'�$�$�r�x�x�.�)�
�'�*�4��&�	�
��u�
-��
;���
�
��3��V��L�L��O���u�X��'�
��"�[�[�)I�J�
�&�.�.�#�5��&�	�'�/�/�!�F�)�c�/�A��'�
��=��L�L��O���e�H��&�	��7��
�
��%�(�(r�r�roT)r�rqz'Stata Internal Format tC not supported.rs)r:rwrx)rQr_rzr{r|)rQr~rBrr�r�r�r�r�r��Format z! is not a known Stata date format)rv�copy�<d��r�rKr�)FFF)r�r'r��boolrQr�r_r�)rKr!r�rr�rvr"rCr�r�r�r�rrQr_rRr�r`rfr'r��float64�struct�unpack)
r�r�r��bad_locr[r��
missing_valuer�r�rKs
       @@@rW�_datetime_to_stata_elapsed_vecr��so���
�K�K�E�/�J��d�"�J�NS�,)��,)�"�,)�26�,)�FJ�,)�\�5�k�G��K�K�E��{�{�}��?�?�5�;�;��,�%0��%=�E�M�M�'�"�%0�E�M�M�'�"�
�m���U�$�/���W�W�t�^�
�	�
�	��
�
�5�'�)�	
��
�	�
�	��U�$�/���W�W�
�*�
�	�
�	��U��D�9���1�6�6�K�$4�$4�4�5����!��C�
�	�
�	��U��.���1�6�6�K�$4�$4�4�5����?�!�C�
�	�
�	��U��.���!�&�&�;�#3�#3�3�4����!���7I�I�
�	�
�	��U��.���!�&�&�;�#3�#3�3�4����!��7K�7K�C�7P�P�
�	�
�	��U��.���V�V�
��7�3�%�'H�I�J�J��
�"�*�*�5�A�J��M�M�$�(K�L�Q�O�M�'�J�w���*�E��6�6r�z�
Fixed width strings in Stata .dta files are limited to 244 (or fewer)
characters.  Column '{0}' does not satisfy this restriction. Use the
'version=117' parameter to write the newer (Stata 13 and later) format.
�excessive_string_length_errorz�
Column converted from {0} to {1}, and some data are outside of the lossless
conversion range. This may result in a loss of precision in the saved data.
�precision_loss_docz�
Stata value labels (pandas categories) must be strings. Column {0} contains
non-string labels which will be converted to strings.  Please check that the
Stata data file created has not lost information due to duplicate labels.
�value_label_mismatch_doca;
Not all pandas column names were valid Stata variable names.
The following replacements have been made:

    {0}

If this is not what you expect, please make sure you have Stata-compliant
column names in your DataFrame (strings only, max 32 characters, only
alphanumerics and underscores, no Stata reserved words)
�invalid_name_doca�
One or more series with value labels are not fully labeled. Reading this
dataset with an iterator results in categorical variable with different
categories. This occurs since it is not possible to know all possible values
until the entire dataset has been read. To avoid this warning, you can either
read dataset without an iterator, or manually convert categorical data by
``convert_categoricals`` to False and then accessing the variable labels
through the value_labels method of the reader.
�categorical_conversion_warningc
�H�d}tjtjtjftjtjtjftj
tjtjftjtjtjftjtjtjff}tjdd�d}tjdd�d}|D�]�}t||jt�xr||jj dv}||j#�}|r]||jj dvrdnd	}||j%|�j'||jj(�||<n�t||jt�rxt+||jd
d��/||j'||jj(�||<n/t-||j�r||j'd�||<||j}	|j.ddk(}
|D]�}|	|dk(s�|
s6||j1�tj2|d
�j0kr|d
}	n|d}	|dtjk(r,||j1�dk\rt4j7dd�}||j'|	�||<��|	tjk(rV|
sT||j1�dkDs||j9�dk�r5||j'tj�||<�n|	tjk(rV|
sT||j1�dkDs||j9�dk�r�||j'tj�||<�n�|	tjk(r�|
s,||j1�dkr=||j9�dk\r'||j'tj�||<�n=||j'tj�||<||j1�dk\s||j9�dkr�t4j7dd�}n�|	tj:tjfvr�tj<||�j?�rtAd|�d���||j1�}|	tj:k(r+||kDr&||j'tj�||<n-|	tjk(r||kDrtAd|�d|�d|�d���|s���|j?�s���tBjD||jjF}
|
|jH||f<���|r$tKjL|tNtQ���|S) a-
    Checks the dtypes of the columns of a pandas DataFrame for
    compatibility with the data types and ranges supported by Stata, and
    converts if necessary.

    Parameters
    ----------
    data : DataFrame
        The DataFrame to check and convert

    Notes
    -----
    Numeric columns in Stata must be one of int8, int16, int32, float32 or
    float64, with some additional value restrictions.  int8 and int16 columns
    are checked for violations of the value restrictions and upcast if needed.
    int64 data is not usable in Stata, and so it is downcast to int32 whenever
    the value are in the int32 range, and sidecast to float64 when larger than
    this range.  If the int64 values are outside of the range of those
    perfectly representable as float64 values, a warning is raised.

    bool columns are cast to int8.  uint columns are converted to int of the
    same size if there is no loss in precision, otherwise are upcast to a
    larger type.  uint64 is currently not supported since it is concerted to
    object in a DataFrame.
    ��<f����~rr����������iub�iuF�numpy_dtypeNr�rBr�l�uint64r�rH�������������l����r�zColumn zM contains infinity or -infinitywhich is outside the range supported by Stata.z has a maximum value (z() outside the range supported by Stata (�)rs))r��bool_�int8�uint8�int16�uint16�int32�uint32r�r�r�r�r��
isinstancervr�kindr!�fillnar�r�rOr�shaperM�iinfor�rJrN�float32�isinfr�rf�StataMissingValue�BASE_MISSING_VALUES�name�locr�r�rr)�data�ws�conversion_data�float32_max�float64_max�col�is_nullable_int�orig_missing�fvrv�empty_df�c_datara�sentinels              rW�_cast_to_stata_typesr�#s���4
�B�
���2�7�7�B�G�G�$�	���2�7�7�B�H�H�%�	���B�H�H�b�h�h�'�	���B�H�H�b�h�h�'�	���B�H�H�b�j�j�)�	�
��-�-��&9�:�1�=�K��-�-��&I�J�1�M�K���
�t�C�y����7�
.��S�	���$�$��-�	�
�C�y�~�~�'����3�i�o�o�*�*�d�2���B��S�	�(�(��,�3�3�D��I�O�O�4O�4O�P�D��I�
��S�	����
8��t�C�y���
�t�<�H� ��I�,�,�T�#�Y�_�_�-H�-H�I��S�	� ��c����1� ��I�,�,�X�6��S�	��S�	�����:�:�a�=�A�%��%�F���q�	�!��t�C�y�}�}��"�(�(�6�!�9�2E�2I�2I�I�"�1�I�E�"�1�I�E��!�9����(��C�y�}�}��%�/�/�6�6�x��K�� ��I�,�,�U�3��S�	�&��B�G�G��H��C�y�}�}���$��S�	�
�
��$�(>� ��I�,�,�R�X�X�6��S�	�
�b�h�h�
�x��C�y�}�}���&�$�s�)�-�-�/�F�*B� ��I�,�,�R�X�X�6��S�	�
�b�h�h�
���S�	�
�
��:�-�$�s�)�-�-�/�[�2P� ��I�,�,�R�X�X�6��S�	� ��I�,�,�R�Z�Z�8��S�	���9�=�=�?�e�+�t�C�y�}�}��(�/J�+�2�2�7�I�F�B�
�r�z�z�2�:�:�.�
.��x�x��S�	�"�&�&�(� ��c�U�#E�E�����I�M�M�O�E���
�
�"�u�{�':� ��I�,�,�R�Z�Z�8��S�	��"�*�*�$��;�&�$�!�#��&<�U�G�D/�/:�m�1�>�������!�,�@�@��c����AU�AU�V��.6�����s�*�+�I�J
��
�
��!�'�)�	
��Kr�c�6�eZdZdZ	d					dd�Zdd�Zd	d�Zy)
�StataValueLabelz�
    Parse a categorical column and prepare formatted output

    Parameters
    ----------
    catarray : Series
        Categorical Series to encode
    encoding : {"latin-1", "utf-8"}
        Encoding to use for value labels.
    c��|dvrtd��|j|_||_|jj
}t
|�|_|j�y)N��latin-1�utf-8�%Only latin-1 and utf-8 are supported.)	rfr��labname�	_encoding�cat�
categories�	enumerate�value_labels�_prepare_value_labels)�self�catarray�encodingrs    rW�__init__zStataValueLabel.__init__�sS���/�/��D�E�E��}�}���!����\�\�,�,�
�%�j�1����"�"�$r�c�L�d|_g|_d|_tjgtj
��|_tjgtj
��|_d|_g}g}|jD]�}|d}t|t�sLt|�}tjtj|j �t"t%���|j'|j(�}|j+|j�|xjt|�dzz
c_|j+|d�|jj+|�|xjdz
c_��|jdkDrt-d��tj|tj
��|_tj|tj
��|_dd|jzzd|jzz|jz|_y	)
zEncode value labels.rrurBrsi}zaStata value labels for a single variable must have a combined length less than 32,000 characters.�r�N)�text_len�txt�nr��arrayr��off�val�lenr	r��strr�r�r�rJrrr�encoder�appendrf)r�offsetsri�vl�categorys     rWr
z%StataValueLabel._prepare_value_labels�s�����
� "�������8�8�B�b�h�h�/����8�8�B�b�h�h�/������ �� ���#�#�B�$&�q�E�H��h��,��x�=���
�
�,�3�3�D�L�L�A�*�/�1��
 ���t�~�~�6�H��N�N�4�=�=�)��M�M�S��]�Q�.�.�M��M�M�"�Q�%� ��H�H�O�O�H�%��F�F�a�K�F�$� �=�=�5� ��F��
��8�8�G�2�8�8�4����8�8�F�"�(�(�3����1�t�v�v�:�%��D�F�F�
�2�T�]�]�B��r�c��|j}t�}d}|jtj|dz|j
��t
|j�ddj|�}|dvrdnd}t||dz�}|j|�td�D]'}|jtjd	|���)|jtj|dz|j��|jtj|dz|j��|jD]*}|jtj|dz|���,|jD]*}	|jtj|dz|	���,|jD]}
|j|
|z��|j!�S)
a!
        Generate the binary representation of the value labels.

        Parameters
        ----------
        byteorder : str
            Byte order of the output

        Returns
        -------
        value_label : bytes
            Bytes containing the formatted value label
        ��iN� )r�utf8�rBr��c)rr�writer��packrrrr�
_pad_bytes�rangerrrrr�getvalue)r�	byteorderr
�bio�	null_byter�lab_lenr �offsetra�texts           rW�generate_value_labelz$StataValueLabel.generate_value_label�su���>�>���i���	�	�	�	�&�+�+�i�#�o�t�x�x�8�9��d�l�l�#�C�R�(�/�/��9�� �(9�9�"�s���W�g��k�2���	�	�'���q��A��I�I�f�k�k�#�y�1�2��
	�	�	�&�+�+�i�#�o�t�v�v�6�7�	�	�	�&�+�+�i�#�o�t�}�}�=�>��h�h�F��I�I�f�k�k�)�c�/�6�:�;���X�X�E��I�I�f�k�k�)�c�/�5�9�:���H�H�D��I�I�d�Y�&�'���|�|�~�r�N�r)rr'r
�Literal['latin-1', 'utf-8']r��None�r�r3)r*rr��bytes)�__name__�
__module__�__qualname__�__doc__rr
r0�r�rWr�r��s7��	�IR�
%��
%�*E�
%�	
�
%�*C�X2r�r�c�*�eZdZdZ	d							dd�Zy)�StataNonCatValueLabela
    Prepare formatted version of value labels

    Parameters
    ----------
    labname : str
        Value label name
    value_labels: Dictionary
        Mapping of values to labels
    encoding : {"latin-1", "utf-8"}
        Encoding to use for value labels.
    c��|dvrtd��||_||_t|j	�d���|_|j
�y)Nrrc��|dS)Nrr:r�s rWr�z0StataNonCatValueLabel.__init__.<locals>.<lambda>2s���!�r�)�key)rfrr�sorted�itemsr	r
)rrr	r
s    rWrzStataNonCatValueLabel.__init__&sP���/�/��D�E�E����!���"���� �n�
���	
�"�"�$r�Nr1)rrr	�dict[float, str]r
r2r�r3)r6r7r8r9rr:r�rWr<r<s7���"1:�	%��%�'�%�.�	%�

�%r�r<c���eZdZUdZiZded<dZded<eD])Zdee<edd	�D]Z	de
d
e	z�zee	ez<��+dZded
<ejdd�dZded<ed	�D]uZ	ejde�dZdee<e	dkDreexxe
d
e	z�z
cc<ejdej de��dezZej de�Z�wdZded<ejdd�dZed	�D]uZ	ejde�dZdee<e	dkDreexxe
d
e	z�z
cc<ejdej de��dezZej de�Z�wdddejde�dejde�dd�Zded<d&d�Zed'd��Zed(d ��Zd'd!�Zd'd"�Zd)d#�Zed*d$��Zy%)+r�a�
    An observation's missing value.

    Parameters
    ----------
    value : {int, float}
        The Stata missing value code

    Notes
    -----
    More information: <https://www.stata.com/help.cgi?missing>

    Integer missing values make the code '.', '.a', ..., '.z' to the ranges
    101 ... 127 (for int8), 32741 ... 32767  (for int16) and 2147483621 ...
    2147483647 (for int32).  Missing values for floating point data types are
    more complex but the pattern is simple to discern from the following table.

    np.float32 missing values (float in Stata)
    0000007f    .
    0008007f    .a
    0010007f    .b
    ...
    00c0007f    .x
    00c8007f    .y
    00d0007f    .z

    np.float64 missing values (double in Stata)
    000000000000e07f    .
    000000000001e07f    .a
    000000000002e07f    .b
    ...
    000000000018e07f    .x
    000000000019e07f    .y
    00000000001ae07f    .z
    rB�MISSING_VALUES)�e����r�bases�.rB��`�r5�float32_basez<isrr`�increment_32r�r��float64_base�qsr�rErFrG)r�r�r�r�r�r�c�p�||_|dkrt|�n
t|�}|j||_y)Nl)�_valuer`r�rD�_str�rras  rWrzStataMissingValue.__init__�s1�����#�j�0��E�
�e�E�l���'�'��.��	r�c��|jS)z�
        The Stata representation of the missing value: '.', '.a'..'.z'

        Returns
        -------
        str
            The representation of the missing value.
        )rS�rs rW�stringzStataMissingValue.string�s���y�y�r�c��|jS)z�
        The binary representation of the missing value.

        Returns
        -------
        {int, float}
            The binary representation of the missing value.
        )rRrVs rWrazStataMissingValue.value�s���{�{�r�c��|jS�N)rWrVs rW�__str__zStataMissingValue.__str__�s���{�{�r�c�$�t|��d|�d�S)N�(r�)�typerVs rW�__repr__zStataMissingValue.__repr__�s���t�*��Q�t�f�A�&�&r�c��t|t|��xr4|j|jk(xr|j|jk(SrZ)r�r^rWra)r�others  rW�__eq__zStataMissingValue.__eq__�s?���u�d�4�j�)�
*����u�|�|�+�
*��
�
�e�k�k�)�	
r�c���|jtjur|jd}|S|jtjur|jd}|S|jtj
ur|jd}|S|jtjur|jd}|S|jtjur|jd}|Std��)Nr�r�r�r�r�zUnsupported dtype)	r^r�r�r�r�r�r�r�rf)�clsrvras   rW�get_base_missing_valuez(StataMissingValue.get_base_missing_value�s����:�:���� ��+�+�F�3�E����Z�Z�2�8�8�
#��+�+�G�4�E����Z�Z�2�8�8�
#��+�+�G�4�E���
�Z�Z�2�:�:�
%��+�+�I�6�E�
��	�Z�Z�2�:�:�
%��+�+�I�6�E����0�1�1r�N)rar�r�r3�r�r)r�r�)rar�r�r�)rv�np.dtyper�r�)r6r7r8r9rD�__annotations__rH�br(r �chrrMr�r�rNr?r&�	int_valuerO�increment_64r�r�propertyrWrar[r_rb�classmethodrer:r�rWr�r�7sQ��"�J(*�N�$�)�+�E�5�+�
����q���q�"��A�$'�#�b�1�f�+�$5�N�1�q�5�!���.�L�%�-�%��
�
�d�,?�@��C�L�#�C�
�2�Y���f�m�m�D�,�/��2��!��s���q�5��3��3�r�A�v�;�.��!�F�M�M�$�����D�#�(>�?��B�\�Q�	�"�v�{�{�4��3��
�>�L�%�=� �6�=�=��&I�J�1�M�L�
�2�Y���f�m�m�D�,�/��2��!��s���q�5��3��3�r�A�v�;�.��!�F�M�M�#�{�v�{�{�4��'=�>�q�A�L�P�	�"�v�{�{�3�	�2��
���� �6�=�=��|�4�Q�7� �6�=�=��|�4�Q�7�"����/��	��	��	��	��'�
��
��
r�r�c��eZdZdd�Zy)�StataParserc��ttdd�D�cgc]}|tjd|���f��c}dtjtj�fdtjtj
�fdtjtj�fdtjtj�fdtjtj�fgz�|_	tjtj�tjtj�tjtj�tjtj�tjtj
�tjtj�d	�|_tttd��td
�z�|_ddd
dddd	�|_d}d}d}d}dddtjt!j"d|�d�tjt!j"d|�d�ftjt!j"d|�d�tjt!j"d|�d�fd�|_dddddd�|_dddtjt!j"dd �d�tjt!j"dd!�d�d�|_d"d#d$d%d&d'd(�|_hd)�|_ycc}w)*NrB��S�����)��������bhlfd�Qr[r��l�hris����r�s�������r�)r�rH)r�r�)r�r�r�rr�)rir�r�r�r[)�b�i�l�frHrErFrGrLr��i1�i2�i4�f4�f8�u8)rir�r�r�r[r�><�str#�_N�_b�_n�do�if�in�_pi�_rc�_se�end�forr`�NULL�_all�byte�case�else�enum�goto�long�quad�strL�with�_coef�_cons�_pred�_skipr�break�catch�class�constr��local�short�using�delete�double�export�friend�global�inline�pragma�boolean�complex�default�typedef�virtual�continue�delegate�explicit�external�function�typename�	aggregate�	colvector�	eltypedef�	protected�	rowvector)�dictr(r�rvr�r�r�r�r��	DTYPE_MAPr��
DTYPE_MAP_XML�list�tuple�TYPE_MAP�TYPE_MAP_XMLr�r��VALID_RANGE�OLD_TYPE_MAPPINGrD�NUMPY_TYPE_MAP�RESERVED_WORDS)rr �float32_minr��float64_minr�s      rWrzStataParser.__init__�s��� �-2�1�c�]�;�]��a����A�a�S�'�"�
#�]�;��b�h�h�r�w�w�'�(��b�h�h�r�x�x�(�)��b�h�h�r�x�x�(�)��b�h�h�r�z�z�*�+��b�h�h�r�z�z�*�+��
�	
����8�8�B�H�H�%��8�8�B�J�J�'��8�8�B�J�J�'��8�8�B�H�H�%��8�8�B�H�H�%��8�8�B�G�G�$�
3
����U�5��:�.��w��?�@��
�������
���*��)��9��9��� �*��
�
�6�=�=��{�;�A�>�?��
�
�6�=�=��{�;�A�>�?��
�
�
�6�=�=��{�;�A�>�?��
�
�6�=�=��{�;�A�>�?��
��������!
���������F�M�M�$�0C�D�Q�G�H�����
�
�d�$G�H��K��
���������

���=
����[
<s�!MNr4)r6r7r8rr:r�rWrprp�s��[
r�rpc�X��eZdZUeZded<										d/																							d0�fd�
Zd1d�Zd1d�Zd2d�Z									d3d�Z
d1d�Zd1d	�Zd4d
�Z
d4d�Zd4d�Zd4d
�Zd4d�Zd4d�Zd4d�Zd4d�Zd5d�Zd6d�Zd1d�Zd1d�Z				d7d�Zd8d�Zd8d�Zd8d�Zd8d�Zd4d�Zd9d�Zd9d�Z d4d�Z!d:d�Z"d;d �Z#d<d!�Z$d1d"�Z%d1d#�Z&d=d$�Z'd>d?d%�Z(e)e*�								d@																	dAd&��Z+dBd'�Z,dCd(�Z-dDd)�Z.										dEd*�Z/e0d9d+��Z1e0d9d,��Z2dFd-�Z3dGd.�Z4�xZ5S)H�StataReaderz	IO[bytes]�_path_or_bufc���t�|��||_||_||_||_||_||_||_||_	|
|_
||_d|_|	|_
d|_d|_|j�d|_
n t!|	t"�r|	dkrt%d��d|_d|_d|_d|_d|_d|_d|_d|_t7t8j:�|_y)Nr�FrBrz.chunksize must be a positive integer when set.)�superr�_convert_dates�_convert_categoricals�
_index_col�_convert_missing�_preserve_dtypes�_columns�_order_categoricals�_original_path_or_buf�_compression�_storage_optionsr�
_chunksize�_using_iterator�_enteredr�r`rf�_close_file�_missing_values�_can_read_value_labels�_column_selector_set�_value_labels_read�
_data_read�_dtype�_lines_read�_set_endianness�sysr*�_native_byteorder)
r�path_or_buf�
convert_dates�convert_categoricals�	index_col�convert_missing�preserve_dtypes�columns�order_categoricals�	chunksize�compressionr7�	__class__s
            �rWrzStataReader.__init__as����	����,���%9��"�#��� /��� /�����
�#5�� �%0��"�'��� /������#���$�����
��?�?�"��D�O��I�s�+�y�A�~��M�N�N�7;���$���&+��#�$)��!�"'������'+������!0����!?��r�c�>�t|d�s|j�yy)zK
        Ensure the file has been opened and its header data read.
        r�N)�hasattr�
_open_filerVs rW�_ensure_openzStataReader._ensure_open�s���t�^�,��O�O��-r�c�Z�|js$tjdtt	���t|jd|jd|j��}t|jd�r=|jj�r#|j|_|j|_nN|5t|jj!��|_ddd�|jj|_|j#�|j%�y#1swY�ExYw)z^
        Open the file (with compression options, etc.), and read header information.
        zStataReader is being used without using a context manager. Using StataReader as a context manager is the only supported method.rs�rbF)r7�is_textr��seekableN)r�r�r��ResourceWarningrr)r�r�r�r��handlerr��closer�r�read�_read_header�_setup_dtype)r�handless  rWr�zStataReader._open_file�s����}�}��M�M�W��+�-�	
���&�&�� �1�1���)�)�
���7�>�>�:�.�7�>�>�3J�3J�3L� '���D��&�}�}�D���$+�G�N�N�,?�,?�,A�$B��!��#�0�0�6�6�D������������s�4)D!�!D*c��d|_|S)zenter context managerT)r�rVs rW�	__enter__zStataReader.__enter__�s����
��r�c�>�|jr|j�yyrZ)r�)r�exc_type�	exc_value�	tracebacks    rW�__exit__zStataReader.__exit__�s���������r�c��tjdtt���|jr|j	�yy)z�Close the handle if its open.

        .. deprecated: 2.0.0

           The close method is not part of the public API.
           The only supported way to use StataReader is to use it as a context manager.
        z�The StataReader.close() method is not part of the public API and will be removed in a future version without notice. Using StataReader as a context manager is the only supported method.rsN)r�r��
FutureWarningrr�rVs rWrzStataReader.close�s=��	�
�
�
S�
�'�)�	
��������r�c�@�|jdkrd|_yd|_y)zC
        Set string encoding which depends on file version
        �vrrN)�_format_versionrrVs rW�
_set_encodingzStataReader._set_encoding�s �����#�%�&�D�N�$�D�Nr�c�f�tjd|jjd��dS)NrirBr�r�r�r�rrVs rW�
_read_int8zStataReader._read_int8��)���}�}�S�$�"3�"3�"8�"8��";�<�Q�?�?r�c�f�tjd|jjd��dS)N�BrBrrrVs rW�_read_uint8zStataReader._read_uint8�rr�c��tj|j�d�|jj	d��dS)N�Hr�r�r�r��
_byteorderr�rrVs rW�_read_uint16zStataReader._read_uint16��5���}�}����0��2�D�4E�4E�4J�4J�1�4M�N�q�Q�Qr�c��tj|j�d�|jj	d��dS)N�Ir�rr rVs rW�_read_uint32zStataReader._read_uint32�r#r�c��tj|j�d�|jj	d��dS)Nr�rrr rVs rW�_read_uint64zStataReader._read_uint64�r#r�c��tj|j�d�|jj	d��dS)Nr�r�rr rVs rW�_read_int16zStataReader._read_int16�r#r�c��tj|j�d�|jj	d��dS)Nr r�rr rVs rW�_read_int32zStataReader._read_int32�r#r�c��tj|j�d�|jj	d��dS)NrPrrr rVs rW�_read_int64zStataReader._read_int64�r#r�c�f�tjd|jjd��dS)Nr$rBrrrVs rW�_read_char8zStataReader._read_char8�rr�c��tj|j�d|z��|jj	d|z��S)Nr�r�r )r�counts  rW�_read_int16_countzStataReader._read_int16_count�s@���}�}������e��}�-����"�"�1�u�9�-�
�	
r�c�r�|j�}|dk(r|j�y|j|�y)N�<)r0�_read_new_header�_read_old_header)r�
first_chars  rWrzStataReader._read_headers2���%�%�'�
�����!�!�#��!�!�*�-r�c��|jjd�t|jjd��|_|jdvr)t	t
j
|j����|j�|jjd�|jjd�dk(rdnd|_|jjd	�|jd
kr|j�n|j�|_|jjd�|j�|_
|jjd�|j�|_|jjd
�|j!�|_|jjd�|jjd�|jjd�|j%�dz|_|j%�dz|_|j%�dz|_|j%�dz|_|j%�d
z|_|j1�|_|jjd�|j%�dz|_|j%�dz|_|j%�dz|_|j;|j&�\|_|_|jjA|j(�|jC�|_"|jjA|j*�|jG|jdz�dd|_$|jjA|j,�|jK�|_&|jjA|j.�|jO�|_(|jjA|j2�|jS�|_*y)NrJr���ur�w��version�sMSF�>�<�rr{���r��
�	r��rB���)+r�rr`rrf�_version_errorrJrr!r"r&�_nvar�	_get_nobs�_nobs�_get_data_label�_data_label�_get_time_stamp�_time_stampr.�_seek_vartypes�_seek_varnames�_seek_sortlist�
_seek_formats�_seek_value_label_names�_get_seek_variable_labels�_seek_variable_labels�_data_location�_seek_strls�_seek_value_labels�_get_dtypes�_typlist�	_dtyplist�seek�_get_varlist�_varlistr3�_srtlist�_get_fmtlist�_fmtlist�_get_lbllist�_lbllist�_get_variable_labels�_variable_labelsrVs rWr6zStataReader._read_new_header	se�������r�"�"�4�#4�#4�#9�#9�!�#<�=�������6��^�2�2�4�;O�;O�2�P�Q�Q����������r�"�!%�!2�!2�!7�!7��!:�f�!D�#�#��������r�"�#'�#7�#7�3�#>�D����D�DU�DU�DW�	
�
�	
�����q�!��^�^�%��
������r�"��/�/�1��������r�"��/�/�1��������r�"������q�!������q�!�"�.�.�0�2�5���"�.�.�0�2�5���"�.�.�0�2�5���!�-�-�/�!�3���'+�'7�'7�'9�B�'>��$�&*�%C�%C�%E��"������q�!�"�.�.�0�1�4����+�+�-��1���"&�"2�"2�"4�r�"9���(,�(8�(8��9L�9L�(M�%��
�t�~������t�2�2�3��)�)�+��
������t�2�2�3��.�.�t�z�z�A�~�>�s��C��
������t�1�1�2��)�)�+��
������t�;�;�<��)�)�+��
������t�9�9�:� $� 9� 9� ;��r�c��|jj|�g}g}t|j�D]�}|j	�}|dkr,|j|�|jt
|���D	|j|j|�|j|j|���||fS#t$r}td|�d��|�d}~wwxYw)N��cannot convert stata types [�])r�r`r(rLr"rrr�r��KeyErrorrf)r�
seek_vartypes�typlist�dtyplist�_�typ�errs       rWr]zStataReader._get_dtypesAs���	
�����}�-������t�z�z�"�A��#�#�%�C��d�{����s�#�����C��)�U��N�N�4�#4�#4�S�#9�:��O�O�D�$6�$6�s�$;�<�#��� � �� �U�$�'C�C�5��%J�K�QT�T��U�s�:<B<�<	C�C�Cc���|jdkrdnd}t|j�D�cgc],}|j|jj|����.c}Scc}w)Nr�!��rr(rL�_decoder�r�rrirrs   rWrazStataReader._get_varlistUsS���&�&��,�B�#��AF�t�z�z�AR�S�AR�A����T�.�.�3�3�A�6�7�AR�S�S��Ss�1Ac��|jdk\rd}n&|jdkDrd}n|jdkDrd}nd}t|j�D�cgc],}|j|jj|����.c}Scc}w)Nr�9�q�1�hr~r{rxrzs   rWrdzStataReader._get_fmtlist[s}�����3�&��A�
�
!�
!�C�
'��A�
�
!�
!�C�
'��A��A�AF�t�z�z�AR�S�AR�A����T�.�.�3�3�A�6�7�AR�S�S��Ss�1Bc���|jdk\rd}n|jdkDrd}nd}t|j�D�cgc],}|j|jj|����.c}Scc}w)Nrrwr�rvrHrxrzs   rWrfzStataReader._get_lbllisthsj�����3�&��A�
�
!�
!�C�
'��A��A�AF�t�z�z�AR�S�AR�A����T�.�.�3�3�A�6�7�AR�S�S��Ss�1A2c�$�|jdk\rLt|j�D�cgc],}|j|jjd����.}}|S|jdkDrLt|j�D�cgc],}|j|jjd����.}}|St|j�D�cgc],}|j|jjd����.}}|Scc}wcc}wcc}w)NriAr��Qr!rx)rrr�vlblists   rWrhz StataReader._get_variable_labelsqs�����3�&�CH����CT��CT�a����T�.�.�3�3�C�8�9�CT�
�����
!�
!�C�
'�BG��
�
�BS��BS�Q����T�.�.�3�3�B�7�8�BS�
����CH��
�
�BS��BS�Q����T�.�.�3�3�B�7�8�BS�
���������s�1D�1D�1D
c�`�|jdk\r|j�S|j�S)Nr)rr(r&rVs rWrMzStataReader._get_nobs�s.�����3�&��$�$�&�&��$�$�&�&r�c���|jdk\r:|j�}|j|jj	|��S|jdk(r:|j�}|j|jj	|��S|jdkDr*|j|jj	d��S|j|jj	d��S)Nrr;r�r�r!)rr"ryr�rr�r�strlens  rWrOzStataReader._get_data_label�s������3�&��&�&�(�F��<�<�� 1� 1� 6� 6�v� >�?�?�
�
!�
!�S�
(��_�_�&�F��<�<�� 1� 1� 6� 6�v� >�?�?�
�
!�
!�C�
'��<�<�� 1� 1� 6� 6�r� :�;�;��<�<�� 1� 1� 6� 6�r� :�;�;r�c��|jdk\r:|j�}|jj|�j	d�S|jdk(r:|j�}|j|jj|��S|jdkDr*|j|jjd��St
��)Nrrr;r�)rrr�r�decoderyrfr�s  rWrQzStataReader._get_time_stamp�s������3�&��_�_�&�F��$�$�)�)�&�1�8�8��A�A�
�
!�
!�S�
(��_�_�&�F��<�<�� 1� 1� 6� 6�v� >�?�?�
�
!�
!�C�
'��<�<�� 1� 1� 6� 6�r� :�;�;��,�r�c���|jdk(r=|jjd�|jd|jzzdzdzS|jdk\r|j�dzSt
��)Nr;rrv��r)rr�rrWrLr.rfrVs rWrXz%StataReader._get_seek_variable_labels�ss�����3�&����"�"�1�%��/�/�2��
�
�?�C�b�H�2�M�M�
�
!�
!�S�
(��#�#�%��*�*��,�r�c	��t|d�|_|jdvr)ttj	|j����|j�|j
�dk(rdnd|_|j
�|_|jjd�|j�|_|j�|_|j�|_|j#�|_|jdkDr<|jj|j�D�cgc]
}t|���}}n�|jj|j�}t'j(|t&j*��}g}|D]C}||j,vr|j/|j,|��0|j/|d	z
��E	|D�cgc]}|j0|��c}|_	|D�cgc]}|j8|��c}|_|jdkDrQt=|j�D�cgc],}|j?|jjd����.c}|_ nPt=|j�D�cgc],}|j?|jjd����.c}|_ |jC|jdz�dd|_"|jG�|_$|jK�|_&|jO�|_(|jdkDrc	|j
�}
|jdkDr|jS�}n|jU�}|
dk(rn|jj|��b|jjW�|_,ycc}wcc}w#t$rC}d
j5|D�	cgc]
}	t7|	���ncc}	wc}	�}
td|
�d��|�d}~wwxYwcc}w#t$rC}d
j5|D�	cgc]
}	t7|	���ncc}	wc}	�}td
|�d��|�d}~wwxYwcc}wcc}w)Nr)rr�r��or}�r�sr=rBr@rAr�ru��,rlrmzcannot convert stata dtypes [rvrHrJr)-r`rrfrKrJrrr!�	_filetyper�rr"rLrMrNrOrPrQrRr��
frombufferr�r�rr�r^�joinrr�r_r(ryrbr3rcrdrerfrgrhrir,r*�tellrZ)rr8r$rp�buf�typlistb�tprsrtr��
invalid_types�invalid_dtypesrr�	data_type�data_lens               rWr7zStataReader._read_old_header�s���"�:�a�=�1������'J�J��^�2�2�4�;O�;O�2�P�Q�Q�����!%���!2�c�!9�#�s������*��������q�!��&�&�(��
��^�^�%��
��/�/�1����/�/�1������#�%�'+�'8�'8�'=�'=�d�j�j�'I�J�'I�!�s�1�v�'I�G�J��#�#�(�(����4�C��}�}�S����9�H��G�����.�.�.��N�N�4�#8�#8��#<�=��N�N�2��8�,�	�	W�;B�C�7�C�T�]�]�3�/�7�C�D�M�	Y�=D�E�W�c�d�n�n�S�1�W�E�D�N�
���#�%�BG��
�
�BS��BS�Q����T�.�.�3�3�B�7�8�BS��D�M�
BG�t�z�z�AR��AR�A����T�.�.�3�3�A�6�7�AR��D�M��.�.�t�z�z�A�~�>�s��C��
��)�)�+��
��)�)�+��
� $� 9� 9� ;������#�%�� �O�O�-�	��'�'�#�-�#�/�/�1�H�#�/�/�1�H���>���!�!�&�&�x�0��#�/�/�4�4�6����oK��D���	W��H�H�g�%>�g��c�!�f�g��%>�?�M��;�M�?�!�L�M�SV�V��	W��F���	Y� �X�X�w�&?�w�!�s�1�v�w��&?�@�N��<�^�<L�A�N�O�UX�X��	Y��
��s~�1O�O�O�5O�>P%�P �P%�1Q4�1Q9�O�	P�P�)O<
�;P�P� P%�%	Q1�.Q,�=Q
�Q,�,Q1c��|j�|jSg}t|j�D]n\}}||jvrBt	t
|�}|j
d|��|j�|j|��f��V|j
d|��d|��f��ptj|�|_|jS)z"Map between numpy and state dtypes�srs)
r�rr^r�r
rrr!r�rv)r�dtypesr rss    rWrzStataReader._setup_dtype�s����;�;�"��;�;������
�
�.�F�A�s��d�)�)�)��3��n���
�
��1�#�w�4�?�?�*;�D�<O�<O�PS�<T�;U�(V�W�X��
�
��1�#�w�!�C�5�	�2�3�/��h�h�v�&����{�{�r�c��|jd�d}	|j|j�S#t$rJ|j}d|�d�}t	j
|tt���|jd�cYSwxYw)Nrrz@
One or more strings in the dta file could not be decoded using z�, and
so the fallback encoding of latin-1 is being used.  This can happen when a file
has been incorrectly encoded by Stata or some other software. You should verify
the string values returned are correct.rsr)�	partitionr�r�UnicodeDecodeErrorr�r��UnicodeWarningr)rr�r
�msgs    rWryzStataReader._decodes���
�K�K���q�!��	'��8�8�D�N�N�+�+��!�	'��~�~�H�@�@H�z�J(�+�C�

�M�M���+�-�
�
�8�8�I�&�&�	'�s�1�AB�Bc�.�|j�|jry|jdkrd|_i|_y|jdk\r&|jj|j�nY|j�J�|j|jjz}|jj|j|z�d|_i|_	|jdk\r'|jjd�dk(r		d|_y|jjd�}|s		d|_y|jdkr+|j|jjd��}n*|j|jjd��}|jjd	�|j�}|j�}tj|jjd|z�|j �d
�|��}tj|jjd|z�|j �d
�|��}tj"|�}||}||}|jj|�}	i|j|<t%|�D]>}
|
|dz
kr||
dzn|}|j|	||
|�|j|||
<�@|jdk\r|jjd
���>)Nr�Tr;�s</valr�rvrwr�r��rvr2rBr�)r�r�r�_value_label_dictr�r`r\r�rN�itemsizerZrryr&r�r�r!�argsortr()rr.�slengthrr�txtlenrr�iirr r�s            rW�_read_value_labelszStataReader._read_value_labelss��������"�"�����3�&�&*�D�#�BD�D�"�����3�&����"�"�4�#:�#:�;��;�;�*�*�*��Z�Z�$�+�+�"6�"6�6�F����"�"�4�#6�#6��#?�@�"&���!#�����#�#�s�*��$�$�)�)�!�,��8��>#'���;�'�'�,�,�Q�/�G���6#'���5�#�#�s�*��,�,�t�'8�'8�'=�'=�b�'A�B���,�,�t�'8�'8�'=�'=�c�'B�C�����"�"�1�%��!�!�#�A��&�&�(�F��-�-��!�!�&�&�q�1�u�-����7H��5K�ST��C��-�-��!�!�&�&�q�1�u�-����7H��5K�ST��C����C��B��b�'�C��b�'�C��#�#�(�(��0�C�.0�D�"�"�7�+��1�X��$%��A��I�c�!�a�%�j�6��:>�,�,���A���%�;��&�&�w�/��A��7��
�#�#�s�*��!�!�&�&�q�)�Cr�c���|jj|j�ddi|_	|jj	d�dk7ry|j
dk(r|j
�}ns|jj	d�}|j
dk(rdnd}|jd	k(r|d
||dd|z
z}n|d
||d|zdz}tjd|�d
}|j�}|j�}|jj	|�}|d
k(r|d
dj|j�}nt|�}||jt|�<��6)N�0r�r�sGSOr;r~rr�rArr�r��rJ)r�r`r[�GSOrrr(r!r�r�rr&r�rr)r�v_or��v_sizers�length�va�
decoded_vas        rW�_read_strlszStataReader._read_strlsSsX�������t�/�/�0���9����� � �%�%�a�(�F�2���#�#�s�*��'�'�)���'�'�,�,�R�0��"�2�2�c�9��q���?�?�c�)��a��-�#�a�2��;�*@�@�C��a��-�#�q�6�z�n�*=�=�C��m�m�C��-�a�0���"�"�$�C��&�&�(�F��"�"�'�'��/�B��c�z���"�X�_�_�T�^�^�<�
�!��W�
�!+�D�H�H�S��X��3r�c�H�d|_|j|j��S)NT��nrows)r�rr�rVs rW�__next__zStataReader.__next__rs��#����y�y�t���y�/�/r�c�B�|�|j}|j|��S)a
        Reads lines from Stata file and returns as dataframe

        Parameters
        ----------
        size : int, defaults to None
            Number of lines to read.  If None, reads whole file.

        Returns
        -------
        DataFrame
        r�)r�r)r�sizes  rW�	get_chunkzStataReader.get_chunkvs#���<��?�?�D��y�y�t�y�$�$r�c		��
��|j�|�|j}|�|j}|�|j}|�|j}|�|j
}|�|j}|�|j}|�|j}|jdk(r�|dk(r�d|_	d|_
t|j��}	t|	j�D]V\}
}|j|
}t!|t"j$�s�0|j&dk7s�@|	|j)|�|	|<�X|�|j+|	|�}	|	S|j,dk\r#|j.sd|_	|j1�|j2�J�|j2}
|j|j4z
|
j6z}||
j6z}t9||�}|dkr|r|j;�t<�|j4|
j6z}|j>jA|jB|z�t9||j|j4z
�}t#jD|j>jG|�|
|��}|xj4|z
c_|j4|jk(rd|_	d|_
|jH|jJk7r7|jM�jO|j$jQ��}|r|j;�tS|�dk(rt|j��}	n/tjT|�}	tW|j�|	_|�(tY|j4|z
|j4�|	_-|�|j+|	|�}	t]|	|j^�D]7\}}t!|t`�s�|	|jc|jd�|	|<�9|jg|	�}	t|j�D�
�cgc]
\}
}|��	|
��}}
}t#j$th�}|D]e}|	jjdd�|fj$}
|
||j|fvs�4|	jm||	jjdd�|fj)|
���g|jo|	|�}	|rct|jp�D]K\}
�ts�fd�ttD��s�|	jm|
tw|	jjdd�|
f����M|r7|j,dkDr(|jy|	|jz|j||�}	|�s^g}d	}|	D�]4}|	|j$}
|
t#j$t"j~�t#j$t"j��fvr&t#j$t"j��}
d}n�|
t#j$t"j��t#j$t"j��t#j$t"j��fvr%t#j$t"j��}
d}|j�||	|j)|
�f���7|rtj�t�|��}	|� |	j�|	j�|��}	|	Scc}}
w)
NrT)r�rsr;r�c3�@�K�|]}�j|����y�wrZ)r�)�.0�date_fmtr�s  �rW�	<genexpr>z#StataReader.read.<locals>.<genexpr>�s�����N�
�H�s�~�~�h�/�
�s�r�F)Kr�r�r�r�r�r�r�r�rNr�r�r$rbrr�r_r�r�rv�charr��_do_select_columnsrr�r�r�r�r�rNr��
StopIterationr�r`rZr�rr!r��byteswapr��newbyteorderr�from_recordsr%r&rKrPr^r`r�ry�
_insert_strlsr��iloc�isetitem�_do_convert_missingrer��
_date_formatsr��_do_convert_categoricalsr�rg�float16r�r�r�r�r�r�r�	from_dictr��	set_index�pop)rr�r�r�r�r�r�r�r�r�r r��dtrv�max_read_len�read_lenr.�
read_lines�raw_datars�dtyp�valid_dtypes�object_type�idx�retyped_data�convertr�s                          @rWrzStataReader.read�s����	
����� � �/�/�M��'�#'�#=�#=� ��"�"�3�3�O��"�"�3�3�O��?��m�m�G��%�!%�!9�!9�������I��=��J�J�E�

�J�J�!�O��!��*.�D�'�"�D�O��T�]�]�3�D�#�D�L�L�1���3��^�^�A�&���b�"�(�(�+��w�w�#�~�$(��I�$4�$4�R�$8��S�	�	2�
�"��.�.�t�W�=���K�� � �C�'�$�2I�2I�*.�D�'������{�{�&�&�&������
�
�T�%5�%5�5����G���5�>�>�)���x��.���q�=�$��'�'�)����!�!�E�N�N�2�������t�2�2�V�;�<����
�
�T�-=�-=� =�>�
��=�=����"�"�8�,�E��
��	
���J�&�����t�z�z�)�*.�D�'�"�D�O��?�?�d�4�4�4��(�(�*�/�/����0K�0K�0M�N�H���#�#�%��x�=�A���T�]�]�3�D��)�)�(�3�D� ����/�D�L���#�� � �:�-�t�/?�/?��D�J����*�*�4��9�D��D�$�-�-�0�H�C���#�s�#� ��I�O�O�D�L�L�9��S�	�1��!�!�$�'��*3�4�>�>�)B�W�)B�g�a��d�FV��)B��W��h�h�v�&���C��I�I�a��f�%�+�+�E��[�$�.�.��*=�>�>��
�
�c�4�9�9�Q��V�#4�#;�#;�E�#B�C� �
�'�'��o�>���#�D�M�M�2���3��N�
�N�N��M�M��>�t�y�y��A���PS�T��3� �D�$8�$8�3�$>��0�0��d�,�,�d�m�m�=O��D���L��G����S�	�����R�X�X�b�j�j�1�2�8�8�B�J�J�3G�H�H��H�H�R�Z�Z�0�E�"�G���H�H�R�W�W�%��H�H�R�X�X�&��H�H�R�X�X�&���
�H�H�R�X�X�.�E�"�G��#�#�S�$�s�)�*:�*:�5�*A�$B�C��� �*�*�4��+=�>��� ��>�>�$�(�(�9�"5�6�D����WXs�&
[0�1[0c���i}tt|j��D�]�}|j|}||jvr�"tt|�}|j|\}}|jdd�|f}|j}	|	|k|	|kDz}
|
j�s�|r�tjtj|
��d}tj||
d��\}}
t|t��}t!|�D]'\}}t#|�}||
|k(}||j|<�)n�|j$}|tj&tj(fvrtj(}t||��}|jj*ds|j-�}tj.|j|
<|||<���|r*|j1�D]\}}|j3||��|S)NrT)�return_inverseru�	WRITEABLE)r(rr�r^r�r
rr�r�r�r��nonzero�asarray�uniquer'r�rr�rvr�r��flagsr��nanrAr�)rr�r��replacementsr r��nmin�nmax�series�svals�missing�missing_loc�umissing�umissing_loc�replacement�j�umr�r�rvr�ras                      rWr�zStataReader._do_convert_missings������s�4�<�<�(�)�A��-�-��"�C��$�*�*�*���s�C�.�C��)�)�#�.�J�D�$��Y�Y�q�!�t�_�F��N�N�E��t�|����5�G��;�;�=��� �j�j����G�)<�=�a�@��)+���6�'�?�SW�)X�&��,�$�V�6�:��&�x�0�E�A�r�$5�b�$9�M�%�l�a�&7�8�C�,9�K�$�$�S�)�	1���������R�Z�Z� 8�8��J�J�E�$�V�5�9��"�*�*�0�0��=�#.�"2�"2�"4�K�02�v�v��#�#�G�,�)�L��O�K*�L�*�0�0�2�
��U��
�
�c�5�)�3��r�c�0�t|d�rt|j�dk(r|St|j�D]R\}}|dk7r�|j||jdd�|fD�cgc]}|jt|���c}��T|Scc}w)Nr�rr�)r�rr�rr^r�r�r)rr�r rs�ks     rWr�zStataReader._insert_strlsMs����t�U�#�s�4�8�8�}��'9��K���
�
�.�F�A�s��c�z���M�M�!��	�	�!�Q�$��H��1�d�h�h�s�1�v�.��H�I�	/�
���Is�)Bc��|j�s7t|�}t|�t|�k7rtd��|j	|j
�}|r(dj
t|��}td|����g}g}g}g}	|D]�}
|j
j|
�}|j|j|�|j|j|�|j|j|�|	j|j|���||_
||_||_|	|_
d|_||S)Nz"columns contains duplicate entriesz, z<The following columns were not found in the Stata data set: T)r��setrrf�
differencer�r�r��get_locrr_r^rerg)rr�r��
column_set�	unmatched�joinedrqrp�fmtlist�lbllistr�r s            rWr�zStataReader._do_select_columnsWs-���(�(��W��J��:��#�g�,�.� �!E�F�F�"�-�-�d�l�l�;�I�����4�	�?�3�� �4�4:�8�=���
�H��G��G��G����L�L�(�(��-��������q� 1�2����t�}�}�Q�/�0����t�}�}�Q�/�0����t�}�}�Q�/�0��&�D�N�#�D�M�#�D�M�#�D�M�(,�D�%��G�}�r�c��|s|Sg}t||�D�]c\}}||v�rB||}tjt|j	���}	||}
|
j|	�}|jr|j�r|	}n6|jr(tjttt���d}t|
||��}
|�>g}|
jD],}||vr|j||��|j|��.nt|j!��}	|
j#|�}
t'|
|j*d��}|j||f���N|j|||f���ft/t1|�d��}|S#t$$rd}t'|d��j)�}t|j*|dkD�}ddj-|�z}d	|�d
|�d�}t%|�|�d}~wwxYw)zC
        Converts categorical columns to Categorical type.
        rsN)r�orderedF)r�rBzQ--------------------------------------------------------------------------------
r4z
Value labels for column a are not unique. These cannot be converted to
pandas categoricals.

Either read the file with `convert_categoricals` set to False or use the
low level interface in `StataReader` to separately read the values and the
value_labels.

The repeated labels are:
r�)rPr�rr��keys�isinr��allr�r�r�rrrrrri�rename_categoriesrfr'�value_countsrKr�r$r�)rr��value_label_dictrr��cat_converted_datar��labelrr�column�key_matches�initial_categories�cat_datarrrt�vc�
repeated_cats�repeatsr��
cat_seriess                      rWr�z$StataReader._do_convert_categoricalsws�� ��K����d�G�,�J�C���(�(�%�e�,���x�x��R�W�W�Y��0���c���$�k�k�$�/���'�'�K�O�O�,=�<@�&��+�+� �
�
�:�8�'7�'9��
*.�&�&��'9�CU���&�-�!#�J�$,�$7�$7��#�r�>�&�-�-�b��l�;�&�-�-�h�7�	%8�"&�b�i�i�k�!2�J�3� (�9�9�*�E�H�&$�H�D�J�J�U�K�
�"�)�)�3�
�*;�<�"�)�)�3��S�	�*:�;�{-�|��0�1��>�����/"�3��
��7�D�D�F�B�$(����"�q�&�)9�$:�M�-��	�	�-�0H�H�G�����	�	�
�
�C�%�S�/�s�2��!3�s�#F�	G?�AG:�:G?c�:�|j�|jS)a�
        Return data label of Stata file.

        Examples
        --------
        >>> df = pd.DataFrame([(1,)], columns=["variable"])
        >>> time_stamp = pd.Timestamp(2000, 2, 29, 14, 21)
        >>> data_label = "This is a data file."
        >>> path = "/My_path/filename.dta"
        >>> df.to_stata(path, time_stamp=time_stamp,    # doctest: +SKIP
        ...             data_label=data_label,  # doctest: +SKIP
        ...             version=None)  # doctest: +SKIP
        >>> with pd.io.stata.StataReader(path) as reader:  # doctest: +SKIP
        ...     print(reader.data_label)  # doctest: +SKIP
        This is a data file.
        )r�rPrVs rW�
data_labelzStataReader.data_label�s��$	
�������r�c�:�|j�|jS)z2
        Return time stamp of Stata file.
        )r�rRrVs rW�
time_stampzStataReader.time_stamp�s��
	
�������r�c�t�|j�tt|j|j��S)a�
        Return a dict associating each variable name with corresponding label.

        Returns
        -------
        dict

        Examples
        --------
        >>> df = pd.DataFrame([[1, 2], [3, 4]], columns=["col_1", "col_2"])
        >>> time_stamp = pd.Timestamp(2000, 2, 29, 14, 21)
        >>> path = "/My_path/filename.dta"
        >>> variable_labels = {"col_1": "This is an example"}
        >>> df.to_stata(path, time_stamp=time_stamp,  # doctest: +SKIP
        ...             variable_labels=variable_labels, version=None)  # doctest: +SKIP
        >>> with pd.io.stata.StataReader(path) as reader:  # doctest: +SKIP
        ...     print(reader.variable_labels())  # doctest: +SKIP
        {'index': '', 'col_1': 'This is an example', 'col_2': ''}
        >>> pd.read_stata(path)  # doctest: +SKIP
            index col_1 col_2
        0       0    1    2
        1       1    3    4
        )r�r�rPrbrirVs rW�variable_labelszStataReader.variable_labels�s,��0	
�����C��
�
�t�'<�'<�=�>�>r�c�R�|js|j�|jS)aX
        Return a nested dict associating each variable name to its value and label.

        Returns
        -------
        dict

        Examples
        --------
        >>> df = pd.DataFrame([[1, 2], [3, 4]], columns=["col_1", "col_2"])
        >>> time_stamp = pd.Timestamp(2000, 2, 29, 14, 21)
        >>> path = "/My_path/filename.dta"
        >>> value_labels = {"col_1": {3: "x"}}
        >>> df.to_stata(path, time_stamp=time_stamp,  # doctest: +SKIP
        ...             value_labels=value_labels, version=None)  # doctest: +SKIP
        >>> with pd.io.stata.StataReader(path) as reader:  # doctest: +SKIP
        ...     print(reader.value_labels())  # doctest: +SKIP
        {'col_1': {3: 'x'}}
        >>> pd.read_stata(path)  # doctest: +SKIP
            index col_1 col_2
        0       0    1    2
        1       1    x    4
        )r�r�r�rVs rWr	zStataReader.value_labels�s%��0�&�&��#�#�%��%�%�%r�)
TTNFTNTN�inferN)r��FilePath | ReadBuffer[bytes]r�r�r�r�r��
str | Noner�r�r�r�r��Sequence[str] | Noner�r�r��
int | Noner�r.r7�StorageOptions | Noner�r3r4)r�r1)r
ztype[BaseException] | NonerzBaseException | NonerzTracebackType | Noner�r3)r�r`)r�r5)r2r`r�ztuple[int, ...])ror`r�z,tuple[list[int | str], list[str | np.dtype]])r�z	list[str]rf)r8r5r�r3)r�rg)r�r5r�r)r�r$rZ)r�rr�r$)NNNNNNNN)r�rr��bool | Noner�r r�rr�r r�r r�rr�r r�r$)r�r$r�r�r�r$�r�r$r�r$)r�r$r��
Sequence[str]r�r$)
r�r$r�dict[str, dict[float, str]]rr"r�r�r�r$)r�zdict[str, str])r�r#)6r6r7r8�_stata_reader_docr9rhrr�r�rrrrrrr"r&r(r*r,r.r0r3rr6r]rardrfrhrMrOrQrXr7rryr�r�r�r�r�_read_method_docrr�r�r�r�rmrrrr	�
__classcell__�r�s@rWr�r�\s�����G���
#�%)� $� %� $�(,�#'� $�*1�15�/@�1�/@��/@�#�	/@�
�/@��
/@��/@�&�/@�!�/@��/@�(�/@�/�/@�
�/@�b��>�
�,��(��(�	�

���$%�@�@�R�R�R�R�R�R�@�
�.�5<�p!� �!�	5�!�(T�
T�T�
�'�
<�
�
�I7�V� '�*7'�r,�>0�%�"���!�%)�,0� $�'+�'+�(,�*.�U��U�#�U�*�	U�
�U�%�
U�%�U�&�U�(�U�
�U� �U�n,�\��@L��L�6�L��	L�
!�L�
�
L�\� �� �(� �� �?�6&r�r�TFr)r�r�r�r�r�r�r�r��iteratorr�r7c
��t|||||||||||
��}|	s|r|S|5|j�cddd�S#1swYyxYw)N)
r�r�r�r�r�r�r�r�r7r�)r�r)
r6r�r�r�r�r�r�r�r�r(r�r7�readers
             rW�
read_statar+sQ�� ��#�1��'�'��-��'���F��9��
�	��{�{�}�
���s	�9�Ac�l�|j�dvry|j�dvrytd|�d���)N)rA�littlerA)r@�bigr@zEndianness r�)�lowerrf)�
endiannesss rWr�r�@s>������_�,��	�	�	�	�|�	+���;�z�l�/�B�C�Cr�c�r�t|t�r|d|t|�z
zzS|d|t|�z
zzS)zQ
    Take a char string and pads it with null bytes until it's length chars.
    r�)r�r5r�r�r�s  rWr'r'Is@���$����g��#�d�)�!3�4�4�4��&�F�S��Y�.�/�/�/r�c�n�|dvr#tjtj�Std|�d���)zK
    Convert from one of the stata date formats to a type in TYPE_MAP.
    )rpr8rwr:ryr<r}r=r�r>r�r?r�r@r�z not implemented)r�rvr��NotImplementedError)r�s rW�_convert_datetime_to_stata_typer6Rs;����� �x�x��
�
�#�#�!�G�C�5�0@�"A�B�Br�c��i}|D]|}||jd�sd||z||<||vr&|j|j|�||i��Lt|t�std��|j|||i��~|S)N�%z0convert_dates key must be a column or an integer)r��updaterKr�r`rf)r��varlist�new_dictr?s    rW�_maybe_convert_to_int_keysr<ks����H����S�!�,�,�S�1�!$�}�S�'9�!9�M�#���'�>��O�O�W�]�]�3�/��s�1C�D�E��c�3�'� �!S�T�T��O�O�S�-��"4�5�6���Or�c���|jtjur*tt	|j
��}t
|d�S|jtjury|jtjury|jtjury|jtjury|jtjurytd|�d���)	a�
    Convert dtype types to stata types. Returns the byte of the given ordinal.
    See TYPE_MAP and comments for an explanation. This is also explained in
    the dta spec.
    1 - 244 are strings of this length
                         Pandas    Stata
    251 - for int8      byte
    252 - for int16     int
    253 - for int32     long
    254 - for float32   float
    255 - for double    double

    If there are dates to convert, then dtype will already have the correct
    type inserted.
    rBrxrwrvrurt�
Data type � not supported.�
r^r��object_rrr�rMr�r�r�r�r�r5)rvrr�s   rW�_dtype_to_stata_typerBys���"
�z�z�R�Z�Z��(�
�f�n�n�(E�F���8�Q���	���r�z�z�	!��	���r�z�z�	!��	���r�x�x�	��	���r�x�x�	��	���r�w�w�	��!�J�u�g�_�"E�F�Fr�c��|dkrd}nd}|ry|jtjurltt	|j
��}||kDr.|dk\ryt
tj|j���dtt|d��zdzS|tjk(ry|tjk(ry	|tjk(ry
|tjtj fvryt#d|�d
���)a�
    Map numpy dtype to stata's default format for this type. Not terribly
    important since users can change this in Stata. Semantics are

    object  -> "%DDs" where DD is the length of the string.  If not a string,
                raise ValueError
    float64 -> "%10.0g"
    float32 -> "%9.0g"
    int64   -> "%9.0g"
    int32   -> "%12.0g"
    int16   -> "%8.0g"
    int8    -> "%8.0g"
    strl    -> "%9s"
    r;��rkz%9sr8rBr�z%10.0gz%9.0gz%12.0gz%8.0gr>r?)r^r�rArrr�rfr�rJr�rrMr�r�r�r�r�r5)rvr�dta_version�
force_strl�max_str_lenr�s      rW�_dtype_to_default_stata_fmtrH�s���&�S���������z�z�R�Z�Z��'�
�f�n�n�(E�F���k�!��c�!�� �!>�!E�!E�f�k�k�!R�S�S��S��X�q�)�*�*�S�0�0�	�"�*�*�	��	�"�*�*�	��	�"�(�(�	��	�2�7�7�B�H�H�%�	%��!�J�u�g�_�"E�F�Fr��compression_options�fname)r7rIc���eZdZUdZdZdZded<								d(dd�																							d)�fd�Zd*d	�Zd+d
�Z					d,d�Z
d-d�Zd-d
�Zd.d�Z
d/d�Zd-d�Zd0d�Zd1d�Zd.d�Zd.d�Zd.d�Zd.d�Zd.d�Zd.d�Zd.d�Zd.d�Zd.d�Z		d2					d3d�Zd.d�Zd.d�Zd.d�Zd.d �Zd.d!�Z d.d"�Z!d-d#�Z"d4d$�Z#d5d%�Z$e%d6d&��Z&d7d'�Z'�xZ(S)8�StataWriterar
    A class for writing Stata binary dta files

    Parameters
    ----------
    fname : path (string), buffer or path object
        string, path object (pathlib.Path or py._path.local.LocalPath) or
        object implementing a binary write() functions. If using a buffer
        then the buffer will not be automatically closed after the file
        is written.
    data : DataFrame
        Input to save
    convert_dates : dict
        Dictionary mapping columns containing datetime types to stata internal
        format to use when writing the dates. Options are 'tc', 'td', 'tm',
        'tw', 'th', 'tq', 'ty'. Column can be either an integer or a name.
        Datetime columns that do not have a conversion type specified will be
        converted to 'tc'. Raises NotImplementedError if a datetime column has
        timezone information
    write_index : bool
        Write the index to Stata dataset.
    byteorder : str
        Can be ">", "<", "little", or "big". default is `sys.byteorder`
    time_stamp : datetime
        A datetime to use as file creation date.  Default is the current time
    data_label : str
        A label for the data set.  Must be 80 characters or smaller.
    variable_labels : dict
        Dictionary containing columns as keys and variable labels as values.
        Each label must be 80 characters or smaller.
    {compression_options}

        .. versionchanged:: 1.4.0 Zstandard support.

    {storage_options}

    value_labels : dict of dicts
        Dictionary containing columns as keys and dictionaries of column value
        to labels as values. The combined length of all labels for a single
        variable must be 32,000 characters or smaller.

        .. versionadded:: 1.4.0

    Returns
    -------
    writer : StataWriter instance
        The StataWriter instance has a write_file method, which will
        write the file to the given `fname`.

    Raises
    ------
    NotImplementedError
        * If datetimes contain timezone information
    ValueError
        * Columns listed in convert_dates are neither datetime64[ns]
          or datetime
        * Column dtype is not representable in Stata
        * Column listed in convert_dates is not in DataFrame
        * Categorical label contains more than 32,000 characters

    Examples
    --------
    >>> data = pd.DataFrame([[1.0, 1]], columns=['a', 'b'])
    >>> writer = StataWriter('./data_file.dta', data)
    >>> writer.write_file()

    Directly write a zip file
    >>> compression = {{"method": "zip", "archive_name": "data_file.dta"}}
    >>> writer = StataWriter('./data_file.zip', data, compression=compression)
    >>> writer.write_file()

    Save a DataFrame with dates
    >>> from datetime import datetime
    >>> data = pd.DataFrame([[datetime(2000,1,1)]], columns=['date'])
    >>> writer = StataWriter('./date_data_file.dta', data, {{'date' : 'tw'}})
    >>> writer.write_file()
    rDrr2rN�r	c����t�|��||_|�in||_||_||_||_||_||_g|_	tjgt��|_
|	|_d|_i|_|j#|�|
|_|�t&j(}t+|�|_||_tj0tj2tj4d�|_y)Nru)rvrurt)r�rr�r��_write_indexrRrPri�_non_cat_value_labels�
_value_labelsr�rr��_has_value_labelsr��_output_file�_converted_names�_prepare_pandasr7r�r*r�r!�_fnamer�r�r��type_converters)
rrJr�r��write_indexr*rrrr�r7r	r�s
            �rWrzStataWriter.__init__ 	s����	������	�$1�$9�b�}���'���%���%��� /���%1��"�46���!#���"�D�!9���'���.2���57������T�"�.������
�
�I�)�)�4������%'�X�X�B�H�H�2�7�7�K��r�c��|jjj|j|j��y)zS
        Helper to call encode before writing to file for Python 3 compat.
        N)r	rr%rr)r�to_writes  rW�_writezStataWriter._writeF	s)��	
�����!�!�(�/�/�$�.�.�"A�Br�c�N�|jjj|�y)z?
        Helper to assert file is open before writing.
        N)r	rr%rTs  rW�_write_byteszStataWriter._write_bytesL	s��	
�����!�!�%�(r�c��g}|j�|S|jj�D]�\}}||jvr|j|}n)||jvrt	|�}ntd|�d���t
||j�std|�d���t|||j�}|j|���|S)zc
        Check for value labels provided for non-categorical columns. Value
        labels
        zCan't create value labels for z!, it wasn't found in the dataset.z6, value labels can only be applied to numeric columns.)rPrArTr�rrnrrvrfr<rr)rr��non_cat_value_labelsr�labels�colname�svls       rW�_prepare_non_cat_value_labelsz)StataWriter._prepare_non_cat_value_labelsR	s���=?���%�%�-�'�'�#�9�9�?�?�A�O�G�V��$�/�/�/��/�/��8���D�L�L�(��g�,���4�W�I�>,�,���
$�D��M�$7�$7�8�!�4�W�I�>>�>���(������H�C� �'�'��,�' B�($�#r�c��|jD�cgc]}t|t���}}t|�s|S|xjtj|�zc_tj}g}t||�D�]�\}}|�r�t|||j��}|jj|�||jjj }|t
j"k(rt%d��||jjj&j)�}	|	j+�||�k\r�|t
j,k(r$tj t
j.�}nZ|t
j.k(r$tj t
j0�}n#tj t
j2�}tj|	|��}	||�|	|	dk(<|j||	f����|j|||f����t5j6t9|��Scc}w)z�
        Check for categorical columns, retain categorical information for
        Stata file and convert categorical data to int
        )r
zCIt is not possible to export int64-based categorical data to Stata.rurJ)r�r�rr�rRr�rr�rerPr�rrQrr�codesrvr�rfr�r�rMr�r�r�r�r$r�r�)
rr�rv�is_catre�data_formattedr��
col_is_catrbris
          rW�_prepare_categoricalsz!StataWriter._prepare_categoricalss	s���
DH�;�;�O�;�%�*�U�$4�5�;��O��6�{��K����"�(�(�6�"2�2��!2�!I�!I����"�4��0�O�C���%�d�3�i�$�.�.�I���"�"�)�)�#�.��S�	�
�
�+�+�1�1���B�H�H�$�$�A����c����,�,�4�4�9�9�;���:�:�<�#9�%�#@�@�����'� "������ 2���"�(�(�*� "������ 2�� "������ 4���X�X�f�E�:�F�(>�e�'D��v��|�$��%�%�s�F�m�4��%�%�s�D��I�&6�7�5 1�6�"�"�4��#7�8�8��GPs�Ic�
�|D]}}||j}|tjtjfvs�5|tjk(r|jd}n|jd}||j|�||<�|S)z�
        Checks floating point data columns for nans, and replaces these with
        the generic Stata for missing value (.)
        r�r[)rvr�r�r�rDr�)rr�r$rvr�s     rW�
_replace_nanszStataWriter._replace_nans�	sy���A���G�M�M�E�����R�Z�Z�0�0��B�J�J�&�"&�"5�"5�c�":�K�"&�"5�"5�c�":�K��q�'�.�.��5��Q����r�c��y)zNo-op, forward compatibilityNr:rVs rW�_update_strl_nameszStataWriter._update_strl_names�	��r�c��|D];}|dks|dkDs�|dks|dkDs�|dks|dkDs�$|dk7s�*|j|d�}�=|S)a�
        Validate variable names for Stata export.

        Parameters
        ----------
        name : str
            Variable name

        Returns
        -------
        str
            The validated name with invalid characters replaced with
            underscores.

        Notes
        -----
        Stata 114 and 117 support ascii characters in a-z, A-Z, 0-9
        and _.
        �A�Z�a�zr��9rr)�replace�rr�r$s   rW�_validate_variable_namez#StataWriter._validate_variable_name�	sS��(�A��S��A��G���W��C����W��C����H��|�|�A�s�+����r�c��i}t|j�}|dd}d}t|�D]�\}}|}t|t�st	|�}|j|�}||jvrd|z}d|dcxkrdkrnnd|z}|dtt|�d�}||k(s\|j|�dkDrCdt	|�z|z}|dtt|�d�}|dz
}|j|�dkDr�C|||<|||<��t|�|_|jrCt||�D]4\}	}
|	|
k7s�|j|
|j|	<|j|
=�6|rzg}|j�D]\}}|�d|��}|j|��tj!d	j#|��}
t%j&|
t(t+��
�||_|j/�|S)a�
        Checks column names to ensure that they are valid Stata column names.
        This includes checks for:
            * Non-string names
            * Stata keywords
            * Variables that start with numbers
            * Variables with names that are too long

        When an illegal variable name is detected, it is converted, and if
        dates are exported, the variable name is propagated to the date
        conversion dictionary
        Nrrrr�rtr!rBz   ->   z
    rs)r�r�rr�rrwr�rNrr2r%r�rPrArr�rJr�r�r�rrrTrm)rr��converted_namesr��original_columns�duplicate_var_idr�r��	orig_namer$�o�conversion_warningr�r�s              rW�_check_column_nameszStataWriter._check_column_names�	s��02���t�|�|�$��"�1�:���� ��)�G�A�t��I��d�C�(��4�y���/�/��5�D��t�*�*�*��T�z���d�1�g�$��$��T�z���,�#�c�$�i��,�-�D��9�$��m�m�D�)�A�-���%5�!6�6��=�D�� 4�#�c�$�i��"4�5�D�$��)�$�	�m�m�D�)�A�-�
.2��	�*��G�A�J�5*�8�W�~�������G�%5�6���1���6�-1�-@�-@��-C�D�'�'��*��+�+�A�.�7�
�!#��#2�#8�#8�#:��	�4�"��8�D�6�2��"�)�)�#�.�$;�"�(�(����7I�)J�K�B��M�M��!�+�-�
�!0������!��r�c��g|_g|_|j�D]i\}}|jjt	||j
|��|jjt
||j
|���kyrZ)r�rprArrHr�rB)rr�r�rvs    rW�_set_formats_and_typesz"StataWriter._set_formats_and_types
sh��"$���"$��� �,�,�.�J�C���L�L��� ;�E�4�9�9�S�>� R�S��L�L��� 4�U�D�I�I�c�N� K�L�)r�c� �|j�}|jr"|j�}t|t�r|}|j|�}t
|�}|j|�}tjd|jd�|_|j|�}|D�cgc]}|j��}}|jj|�}|xj|zc_|j j#|�|j%|�}|j\|_|_||_|jj-�|_|j0}|D]D}||j2vr�t5j6||j8d�s�6d|j2|<�Ft;|j2|j.�|_|j2D]<}	t=|j2|	�}
tj8|
�|j>|	<�>|jA�|jC|�|j2�?|j2D]/}	t|	tD�s�|j2|	|jF|	<�1yycc}w)NFrBr�rp)$r�rO�reset_indexr�r$rr�rkr��repeatr�rRrcrr�rrQ�extendri�nobs�nvarr��tolistr:r�r�rr�rvr<r6r��_encode_stringsr�r`r�)rr��tempr_rb�non_cat_columns�has_non_cat_val_labelsr�r�r?�new_types           rWrUzStataWriter._prepare_pandas
s;���y�y�{������#�#�%�D��$�	�*����'�'��-��$�D�)���!�!�$�'��"$���5�$�*�*�Q�-�!@��� $�A�A�$�G��2F�G�2F�3�3�;�;�2F��G�!%���!2�!2�?�!C�����"8�8�����!�!�"6�7��)�)�$�/��#�z�z���	�4�9���	��|�|�*�*�,��������C��d�)�)�)�����t�C�y����4�+/��#�#�C�(�	�9�������
����&�&�C�6�t�7J�7J�3�7O�P�H�!�x�x��1�F�K�K���'�
	
�����#�#�F�+����*��*�*���c�3�'�(,�(;�(;�C�(@�D�L�L��%�+�+��EHs�)Jc�B�|j}t|dg�}t|j�D]�\}}||vs||vr�|j|}|j}|j
tjus�Gt|d��}|dk(s)t|�dk(s|j}td|�d���|j|jj|j�}tt!|j"��|j$ks��||j|<��y)	z�
        Encode strings in dta-specific encoding

        Do not encode columns marked for date conversion or for strL
        conversion. The strL converter independently handles conversion and
        also accepts empty string arrays.
        �
_convert_strlTr�rWrzColumn `a` cannot be exported.

Only string-like object arrays
containing all strings or a mix of strings and None can be exported.
Object arrays containing only null values are prohibited. Other object
types cannot be exported and must first be converted to one of the
supported types.N)r�rOrr�rvr^r�rArrr�rfrrrrrr��_max_string_length)	rr��convert_strlr r�rrv�inferred_dtype�encodeds	         rWr�zStataWriter._encode_stringsa
s	���+�+�
��t�_�b�9����	�	�*�F�A�s��M�!�S�L�%8���Y�Y�s�^�F��L�L�E��z�z�R�Z�Z�'�!,�V�D�!A��'�8�3��F��q�8H� �+�+�C�$�	�	�������)�)�C�.�,�,�3�3�D�N�N�C��)��w���)G�H��.�.�/�&-�D�I�I�c�N�1+r�c	��t|jd|jd|j��5|_|jj
d�n|jjt�c|_|j_|jjj|jj�	|j|j|j��|j�|j�|j!�|j#�|j%�|j'�|j)�|j+�|j-�|j/�}|j1|�|j3�|j5�|j7�|j�|j9�	ddd�y#t:$r�}|jj=�t?|jt@tBjDf�r�tBjFjI|j�rd	tCjJ|j�|�#tL$r6tOjPd|j�d�tRtU��	�Y|�wxYw|�d}~wwxYw#1swYyxYw)
a
        Export DataFrame object to Stata dta format.

        Examples
        --------
        >>> df = pd.DataFrame({"fully_labelled": [1, 2, 3, 3, 1],
        ...                    "partially_labelled": [1.0, 2.0, np.nan, 9.0, np.nan],
        ...                    "Y": [7, 7, 9, 8, 10],
        ...                    "Z": pd.Categorical(["j", "k", "l", "k", "j"]),
        ...                    })
        >>> path = "/My_path/filename.dta"
        >>> labels = {"fully_labelled": {1: "one", 2: "two", 3: "three"},
        ...           "partially_labelled": {1.0: "one", 2.0: "two"},
        ...           }
        >>> writer = pd.io.stata.StataWriter(path,
        ...                                  df,
        ...                                  value_labels=labels)  # doctest: +SKIP
        >>> writer.write_file()  # doctest: +SKIP
        >>> df = pd.read_stata(path)  # doctest: +SKIP
        >>> df  # doctest: +SKIP
            index fully_labelled  partially_labeled  Y  Z
        0       0            one                one  7  j
        1       1            two                two  7  k
        2       2          three                NaN  9  l
        3       3          three                9.0  8  k
        4       4            one                NaN 10  j
        �wbF)r�rr7�methodN)rrz!This save was not successful but z. could not be deleted. This file is not valid.rs)+r)rVr�r7r	r�rrrS�created_handlesr�
_write_headerrPrR�
_write_map�_write_variable_types�_write_varnames�_write_sortlist�_write_formats�_write_value_label_names�_write_variable_labels�_write_expansion_fields�_write_characteristics�
_prepare_data�_write_data�_write_strls�_write_value_labels�_write_file_close_tag�_close�	Exceptionrr�r�os�PathLike�path�isfile�unlink�OSErrorr�r�rr)r�records�excs   rW�
write_filezStataWriter.write_file�
sE��8��K�K���)�)�� �0�0�
��\��|�|�'�'��1�=�:>���9L�9L�g�i�6��!�4�<�<�#6����,�,�3�3�D�L�L�4G�4G�H�"
��"�"�#�/�/�D�<L�<L�#�����!��*�*�,��$�$�&��$�$�&��#�#�%��-�-�/��+�+�-��,�,�.��+�+�-��,�,�.��� � ��)��!�!�#��(�(�*��*�*�,����!����
�A
�
��B�
����"�"�$��d�k�k�C����+=�>�2�7�7�>�>��K�K�D���	�	�$�+�+�.��	��#�� �
�
�?����}�MB�B�+�'7�'9�	��	����	��
��C
�
�sP�B
K�=D(G/�/	K�8A-K�&J�K�;K�K�K�K�K�K�Kc�4�|j��t|jjt�sJ�|jj|jc}|j_|jjj|j
��yy)z�
        Close the file if it was created by the writer.

        If a buffer or file-like object was passed in, for example a GzipFile,
        then leave this file open for the caller to close.
        N)rSr�r	rrr%r))rr+s  rWr�zStataWriter._close�
sp�����(��d�l�l�1�1�7�;�;�;�'+�|�|�':�':�D�<M�<M�$�C����$��L�L���%�%�c�l�l�n�5�)r�c��y��No-op, future compatibilityNr:rVs rWr�zStataWriter._write_map�
rnr�c��yr�r:rVs rWr�z!StataWriter._write_file_close_tag�
rnr�c��yr�r:rVs rWr�z"StataWriter._write_characteristics�
rnr�c��yr�r:rVs rWr�zStataWriter._write_strls�
rnr�c�:�|jtdd��y)z"Write 5 zeros for expansion fieldsr�r�N)r[r'rVs rWr�z#StataWriter._write_expansion_fields�
s�����J�r�1�%�&r�c�z�|jD],}|j|j|j���.yrZ)rQr]r0r!)rrs  rWr�zStataWriter._write_value_labels�
s/���$�$�B����b�5�5�d�o�o�F�G�%r�c	���|j}|jtjdd��|j	|dk(xrdxsd�|j	d�|j	d�|jtj|dz|j
�dd�|jtj|d	z|j�dd
�|�+|j|jtdd���n-|j|jt|ddd���|�tj�}nt|t�std
��gd�}t|�D��cic]\}}|dz|��
}}}|jd�||jz|jd�z}|j|j|��ycc}}w)Nrir�r@��r2r�r�r r�r��P�"time_stamp should be datetime type��Jan�Feb�Mar�Apr�May�Jun�Jul�Aug�Sep�Oct�Nov�DecrB�%d �	 %Y %H:%M)r!r]r�r&r[r�r��_null_terminate_bytesr'r�nowr�rfr�strftimerR)	rrrr*�monthsr rR�month_lookup�tss	         rWr�zStataWriter._write_header�
s���
�O�O�	����&�+�+�c�3�/�0����I��$�/��9�6�:����F�����F�����&�+�+�i�#�o�t�y�y�A�"�1�E�F����&�+�+�i�#�o�t�y�y�A�"�1�E�F������d�8�8��B��9K�L�M�����*�*�:�j��"�o�r�+J�K�
�
��!����J��J��1��A�B�B�

��6?�v�5F�G�5F���E��A��u��5F��G�����&��:�+�+�,�
-��!�!�+�.�
/�	�
	
���$�4�4�R�8�9��
Hs�8Gc�p�|jD]'}|jtjd|���)y)Nr)rpr]r�r&)rrss  rWr�z!StataWriter._write_variable_types+s)���<�<�C����f�k�k�#�s�3�4� r�c��|jD]3}|j|�}t|ddd�}|j|��5y)Nr!rv)r:�_null_terminate_strr'r[)rr�s  rWr�zStataWriter._write_varnames/s@���L�L�D��+�+�D�1�D��d�3�B�i��,�D��K�K���!r�c�^�tdd|jdzz�}|j|�y)Nr�r�rB)r'r�r[)r�srtlists  rWr�zStataWriter._write_sortlist7s'���R��d�i�i�!�m�!4�5�����G�r�c�\�|jD]}|jt|d���y)Nr~)r�r[r')rr�s  rWr�zStataWriter._write_formats<s#���<�<�C��K�K�
�3��+�,� r�c��t|j�D]m}|j|rA|j|}|j	|�}t|ddd�}|j
|��S|j
tdd���oy)Nr!rvr�)r(r�rRr:r�r'r[)rr r�s   rWr�z$StataWriter._write_value_label_namesAsq���t�y�y�!�A��%�%�a�(��|�|�A����/�/��5��!�$�s��)�R�0�����D�!����J�r�2�.�/�"r�c��tdd�}|j�,t|j�D]}|j	|��y|j
D]�}||jvrc|j|}t
|�dkDrtd��td�|D��}|std��|j	t|d���t|j	|���y)Nr�r�r��.Variable labels must be 80 characters or fewerc3�8K�|]}t|�dk���y�w)�N)�ord)r�r$s  rWr�z5StataWriter._write_variable_labels.<locals>.<genexpr>[s����<�e���A����e�s�zKVariable labels must contain only characters that can be encoded in Latin-1)	r'rir(r�r[r�rrfr)r�blankr r�r
�	is_latin1s      rWr�z"StataWriter._write_variable_labelsMs����2�r�"��� � �(��4�9�9�%�����E�"�&���9�9�C��d�+�+�+��-�-�c�2���u�:��?�$�%U�V�V��<�e�<�<�	� �$�4������J�u�b�1�2����E�"�r�c��|S)r�r:)rr�s  rW�_convert_strlszStataWriter._convert_strlses���r�c�D�|j}|j}|j}|j�7t|�D])\}}||vs�t	|||j
|�||<�+|j
|�}i}|jttj�k(}t|�D]�\}}||}||jkr�tj�5tjddt��||j!d�}	ddd�	j#t$|f��||<d|��}
|
||<||j'|
�||<��||j(}|s|j+|j�}|||<��|j-d|��S#1swY��xYw)	N�ignorezDowncasting object dtype arrays)rr�)�argsrsF)rK�
column_dtypes)r�rpr�rr�r�r�r!r�r�r*r�r��catch_warnings�filterwarningsrr�r�r'r�rvr��
to_records)rr�rpr�r r�r��native_byteorderrs�dc�stypervs            rWr�zStataWriter._prepare_datais����y�y���,�,���+�+�
����*�#�D�/���3��
�%� >��S�	�4�<�<��?�!�D��I�*��"�"�4�(�����?�?�o�c�m�m�.L�L����o�F�A�s��!�*�C��d�-�-�-��,�,�.��+�+� �9�!.��
�c��)�)�"�-�B�
/��H�H�Z�s�f�H�=��S�	��C�5�	��#��s�� ��I�,�,�U�3��S�	��S�	����'�!�.�.�t���?�E�#��s��%&�(���U�&��A�A�#/�.�s�1F�F	c�B�|j|j��yrZ)r]�tobytes�rr�s  rWr�zStataWriter._write_data�s�����'�/�/�+�,r�c��|dz
}|S)Nr2r:)r�s rWr�zStataWriter._null_terminate_str�s��	�V����r�c�V�|j|�j|j�SrZ)r�rr)rr�s  rWr�z!StataWriter._null_terminate_bytes�s"���'�'��*�1�1�$�.�.�A�Ar�)NTNNNNrN)rJ�FilePath | WriteBuffer[bytes]r�r$r��dict[Hashable, str] | NonerXr�r*rr�datetime | Nonerrrr�r�r.r7rr	�'dict[Hashable, dict[float, str]] | Noner�r3)rZrr�r3)rar5r�r3)r�r$r�zlist[StataNonCatValueLabel]r!r4�r�rr�r�r�r'r�r3)r�r$r�r3�NN�rrrr�r�r3)r��np.rec.recarray)r�r�r�r3)r�rr�r)r�rr�r5))r6r7r8r9r�rrhrr[r]rcrirkrmrwrr�rUr�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r�r��staticmethodr�r�r&r's@rWrLrL�s����
L�\��-6�I�*�6�59� � $�&*�!%�6:�*1�15�$L�AE�$L�,�$L��$L�2�	$L�
�$L��
$L�$�$L��$L�4�$L�(�$L�/�$L�>�$L�
�$L�LC�)�$��$�	$�$�B(9�T�"+��<G�RM�@A�D#-�JK�Z6�*�*�*�*�'�H�"&�&*�5:��5:�$�5:�
�	5:�n5���
-�

0�#�0�%B�N-�����Br�rLc���|ry|jtjur2tt	|j
��}t
|d�}|dkr|Sy|jtjury|jtjury|jtjury|jtjury|jtjurytd	|�d
���)a
    Converts dtype types to stata types. Returns the byte of the given ordinal.
    See TYPE_MAP and comments for an explanation. This is also explained in
    the dta spec.
    1 - 2045 are strings of this length
                Pandas    Stata
    32768 - for object    strL
    65526 - for int8      byte
    65527 - for int16     int
    65528 - for int32     long
    65529 - for float32   float
    65530 - for double    double

    If there are dates to convert, then dtype will already have the correct
    type inserted.
    ryrBrkrzr{r|r}r~r>r?r@)rvrrFr�s    rW�_dtype_to_stata_type_117r��s���$���z�z�R�Z�Z��(�
�f�n�n�(E�F���x��#���t���O��	���r�z�z�	!��	���r�z�z�	!��	���r�x�x�	��	���r�x�x�	��	���r�w�w�	��!�J�u�g�_�"E�F�Fr�c�b�t|t�rt|d�}|d|t|�z
zzS)zU
    Takes a bytes instance and pads it with null bytes until it's length chars.
    rr)r�rr5rr3s  rW�_pad_bytes_newr�s3���$����T�7�#���'�V�c�$�i�/�0�0�0r�c�H�eZdZdZ		d									dd�Zd	d�Zd
d�Zdd�Zy)�StataStrLWritera�
    Converter for Stata StrLs

    Stata StrLs map 8 byte values to strings which are stored using a
    dictionary-like format where strings are keyed to two values.

    Parameters
    ----------
    df : DataFrame
        DataFrame to convert
    columns : Sequence[str]
        List of columns names to convert to StrL
    version : int, optional
        dta version.  Currently supports 117, 118 and 119
    byteorder : str, optional
        Can be ">", "<", "little", or "big". default is `sys.byteorder`

    Notes
    -----
    Supports creation of the StrL block of a dta file for dta versions
    117, 118 and 119.  These differ in how the GSO is stored.  118 and
    119 store the GSO lookup value as a uint32 and a uint64, while 117
    uses two uint32s. 118 and 119 also encode all strings as unicode
    which is required by the format.  117 uses 'latin-1' a fixed width
    encoding that extends the 7-bit ascii table with an additional 128
    characters.
    Nc�*�|dvrtd��||_||_||_ddi|_|�t
j}t|�|_d}d}d|_	|dk(rd	}d}d
|_	n
|dk(rd}nd
}ddd|z
zz|_
||_||_y)Nr:z,Only dta versions 117, 118 and 119 supportedr��rrr%r�rr;r�rrr�r�r�r)
rf�_dta_ver�dfr��
_gso_tabler�r*r�r!r�_o_offet�_gso_o_type�_gso_v_type)rrr�r>r*�
gso_v_type�
gso_o_type�o_sizes        rWrzStataStrLWriter.__init__�s����/�)��K�L�L���
��������v�,������
�
�I�)�)�4����
��
� ����c�>��F��J�&�D�N�
��^��F��F��a�1�v�:�.�/��
�%���%��r�c�0�|\}}||j|zzSrZ)r	)rr?r�r}s    rW�_convert_keyzStataStrLWriter._convert_key
s�����1��4�=�=�1�$�$�$r�c��|j}|j}t|j�}||j}|jD�cgc]}||j	|�f��}}tj|jt
j��}t|j��D]b\}\}	}
t|�D]L\}\}}|
|}
|
�dn|
}
|j|
d�}|�|dz|dzf}|||
<|j|�|||f<�N�dt|j�D]\}}|dd�|f||<�||fScc}w)a�
        Generates the GSO lookup table for the DataFrame

        Returns
        -------
        gso_table : dict
            Ordered dictionary using the string found as keys
            and their lookup position (v,o) as values
        gso_df : DataFrame
            DataFrame where strl columns have been converted to
            (v,o) values

        Notes
        -----
        Modifies the DataFrame in-place.

        The DataFrame returned encodes the (v,o) values as uint64s. The
        encoding depends on the dta version, and can be expressed as

        enc = v + o * 2 ** (o_size * 8)

        so that v is stored in the lower bits and o is in the upper
        bits. o_size is

          * 117: 4
          * 118: 6
          * 119: 5
        ruNr�rB)
rrr�r�rKr��emptyr�r�r�iterrows�getr)r�	gso_table�gso_dfr��selectedr��	col_indexrr}r��rowr�r�rr?r s                rW�generate_tablezStataStrLWriter.generate_tables?��:�O�O�	������v�~�~�&���$�,�,�'��:>�,�,�G�,�3�c�7�=�=��-�.�,�	�G��x�x����b�i�i�8��&�x�'8�'8�':�;�M�A�z��S�(��3���8�C���#�h���K�b�S���m�m�C��.���;��q�5�!�a�%�.�C�%(�I�c�N�!�.�.�s�3��Q��T�
� 4�<� ����-�F�A�s��q�!�t�*�F�3�K�.��&� � ��!Hs�Ec	�8�t�}tdd�}tj|jdzd�}tj|jdzd�}|j|j
z}|j|jz}|jdz}|j�D]�\}	}
|
dk(r�|
\}}|j|�|jtj||��|jtj||��|j|�t|	d�}
|jtj|t|
�d	z��|j|
�|j|���|j�S)
a�
        Generates the binary blob of GSOs that is written to the dta file.

        Parameters
        ----------
        gso_table : dict
            Ordered dictionary (str, vo)

        Returns
        -------
        gso : bytes
            Binary content of dta file to be placed between strl tags

        Notes
        -----
        Output format depends on dta version.  117 uses two uint32s to
        express v and o while 118+ uses a uint32 for v and a uint64 for o.
        r��asciirr�rr%rrrB)rr5r�r&r!rr
rAr%rr))rrr+�gso�gso_type�null�v_type�o_type�len_type�strl�vor�r}�utf8_strings              rW�
generate_blobzStataStrLWriter.generate_blobDsJ��:�i���E�7�#���;�;�t����4�c�:���{�{�4�?�?�S�0�!�4�����4�#3�#3�3�����4�#3�#3�3���?�?�S�(��!���)�H�D�"��V�|���D�A�q�
�I�I�c�N�
�I�I�f�k�k�&�!�,�-�
�I�I�f�k�k�&�!�,�-�
�I�I�h�� ��g�.�K��I�I�f�k�k�(�C��,<�q�,@�A�B�
�I�I�k�"��I�I�d�O�/*�2�|�|�~�r�)r;N)
rr$r�r"r>r`r*rr�r3)r?ztuple[int, int]r�r`)r�z,tuple[dict[str, tuple[int, int]], DataFrame])rzdict[str, tuple[int, int]]r�r5)r6r7r8r9rrrr&r:r�rWrr�sV���@� $�&��&��&��	&�
�&�
�
&�B%�1!�f=r�rc�*��eZdZdZdZdZ									ddd�																									d�fd�Zedd��Zdd�Z			d					dd	�Z
d d
�Zd d�Zd d�Z
d d
�Zd d�Zd d�Zd d�Zd d�Zd d�Zd d�Zd d�Zd d�Zd d�Zd d�Zd!d�Zd"d�Z�xZS)#�StataWriter117a�

    A class for writing Stata binary dta files in Stata 13 format (117)

    Parameters
    ----------
    fname : path (string), buffer or path object
        string, path object (pathlib.Path or py._path.local.LocalPath) or
        object implementing a binary write() functions. If using a buffer
        then the buffer will not be automatically closed after the file
        is written.
    data : DataFrame
        Input to save
    convert_dates : dict
        Dictionary mapping columns containing datetime types to stata internal
        format to use when writing the dates. Options are 'tc', 'td', 'tm',
        'tw', 'th', 'tq', 'ty'. Column can be either an integer or a name.
        Datetime columns that do not have a conversion type specified will be
        converted to 'tc'. Raises NotImplementedError if a datetime column has
        timezone information
    write_index : bool
        Write the index to Stata dataset.
    byteorder : str
        Can be ">", "<", "little", or "big". default is `sys.byteorder`
    time_stamp : datetime
        A datetime to use as file creation date.  Default is the current time
    data_label : str
        A label for the data set.  Must be 80 characters or smaller.
    variable_labels : dict
        Dictionary containing columns as keys and variable labels as values.
        Each label must be 80 characters or smaller.
    convert_strl : list
        List of columns names to convert to Stata StrL format.  Columns with
        more than 2045 characters are automatically written as StrL.
        Smaller columns can be converted by including the column name.  Using
        StrLs can reduce output file size when strings are longer than 8
        characters, and either frequently repeated or sparse.
    {compression_options}

        .. versionchanged:: 1.4.0 Zstandard support.

    value_labels : dict of dicts
        Dictionary containing columns as keys and dictionaries of column value
        to labels as values. The combined length of all labels for a single
        variable must be 32,000 characters or smaller.

        .. versionadded:: 1.4.0

    Returns
    -------
    writer : StataWriter117 instance
        The StataWriter117 instance has a write_file method, which will
        write the file to the given `fname`.

    Raises
    ------
    NotImplementedError
        * If datetimes contain timezone information
    ValueError
        * Columns listed in convert_dates are neither datetime64[ns]
          or datetime
        * Column dtype is not representable in Stata
        * Column listed in convert_dates is not in DataFrame
        * Categorical label contains more than 32,000 characters

    Examples
    --------
    >>> data = pd.DataFrame([[1.0, 1, 'a']], columns=['a', 'b', 'c'])
    >>> writer = pd.io.stata.StataWriter117('./data_file.dta', data)
    >>> writer.write_file()

    Directly write a zip file
    >>> compression = {"method": "zip", "archive_name": "data_file.dta"}
    >>> writer = pd.io.stata.StataWriter117(
    ...     './data_file.zip', data, compression=compression
    ...     )
    >>> writer.write_file()

    Or with long strings stored in strl format
    >>> data = pd.DataFrame([['A relatively long string'], [''], ['']],
    ...                     columns=['strls'])
    >>> writer = pd.io.stata.StataWriter117(
    ...     './data_file_with_long_strings.dta', data, convert_strl=['strls'])
    >>> writer.write_file()
    rkr;NrMc
���g|_|	�|jj|	�t�
|�
||||||||||
|��i|_d|_y)N)r*rrrr	r�r7r�)r�r�r�r�_map�
_strl_blob)rrJr�r�rXr*rrrr�r�r7r	r�s             �rWrzStataWriter117.__init__�sj���".0����#����%�%�l�3�
��������!�!�+�%�#�+�	�	
�%'��	���r�c��t|t�rt|d�}td|zdzd�|ztd|zdzd�zS)zSurround val with <tag></tag>rrAr@z</)r�rr5)r�tags  rW�_tagzStataWriter117._tag
sI���c�3����W�%�C��S�3�Y��_�g�.��4�u�T�C�Z�#�=M�w�7W�W�Wr�c��|jj�J�|jjj�|j|<y)z.Update map location for tag with file positionN)r	rr�r*)rr-s  rW�_update_mapzStataWriter117._update_map	
s8���|�|�"�"�.�.�.����,�,�1�1�3��	�	�#�r�c	�j�|j}|jtdd��t�}|j	|jtt
|j�d�d��|j	|j|dk(xrdxsdd��|jdkrd	nd
}|j	|jtj||z|j�d��|jdk(rd
nd
}|j	|jtj||z|j�d��|�|ddnd}|j|j�}|jdk(rdnd	}	tj||	zt|��}
|
|z}|j	|j|d��|�tj �}nt#|t�st%d��gd�}t'|�D��
cic]\}}
|dz|
��
}}}
|j)d�||j*z|j)d�z}dt|d�z}|j	|j|d��|j|j|j-�d��ycc}
}w)zWrite the file headerz<stata_dta>r�releaser@�MSF�LSFr*rrr%�Kr;r��NNr�r�rr
r�r�rBr�r���	timestamp�header)r!r]r5rr%r.r�_dta_versionr�r&r�r�rrrrr�r�rfrr�rRr))rrrr*r+�	nvar_type�	nobs_sizer
�
encoded_label�
label_size�	label_lenr�r rRr�r��stata_tss                 rWr�zStataWriter117._write_header
sJ���O�O�	����%�
�w�7�8��i���	�	�$�)�)�E�#�d�&7�&7�"8�'�B�I�N�O��	�	�$�)�)�I��,�6��?�%��M�N��,�,��3�C��	��	�	�$�)�)�F�K�K�	�I�(=�t�y�y�I�3�O�P��,�,��3�C��	��	�	�$�)�)�F�K�K�	�I�(=�t�y�y�I�3�O�P�#-�#9�
�3�B��r�����T�^�^�4�
� �-�-��4�S�#�
��K�K�	�J� 6��M�8J�K�	�!�M�1�
��	�	�$�)�)�M�7�3�4���!����J��J��1��A�B�B�

��6?�v�5F�G�5F���E��A��u��5F��G�����&��:�+�+�,�
-��!�!�+�.�
/�	��U�2�w�/�/���	�	�$�)�)�H�k�2�3����$�)�)�C�L�L�N�H�=�>��Hs�	J/c��|js8d|jjj�ddddddddddddd�|_|jjj	|jd�t�}|jj
�D]4}|jtj|jdz|���6|j|j|j�d��y)z�
        Called twice during file write. The first populates the values in
        the map with 0s.  The second call writes the final map locations when
        all blocks have been written.
        r)�
stata_data�map�variable_types�varnames�sortlist�formats�value_label_namesr�characteristicsr��strlsr	�stata_data_close�end-of-filerCr�N)r*r	rr�r`rrir%r�r&r!r]r.r))rr+rs   rWr�zStataWriter117._write_mapH
s����y�y���|�|�*�*�/�/�1�"#����%&�#$�#$��� !�$%� ��D�I�"	
����� � ����5�!1�2��i���9�9�#�#�%�C��I�I�f�k�k�$�/�/�C�"7��=�>�&����$�)�)�C�L�L�N�E�:�;r�c��|jd�t�}|jD]4}|jt	j
|jdz|���6|j|j|j�d��y)NrDr)
r0rrpr%r�r&r!r]r.r))rr+rss   rWr�z$StataWriter117._write_variable_typesf
sf�����)�*��i���<�<�C��I�I�f�k�k�$�/�/�C�"7��=�>� ����$�)�)�C�L�L�N�4D�E�Fr�c�z�|jd�t�}|jdk(rdnd}|jD]O}|j	|�}t|ddj
|j�|dz�}|j|��Q|j|j|j�d��y)NrEr;r!r#rB)r0rr:r:r�rrrr%r]r.r))rr+�vn_lenr�s    rWr�zStataWriter117._write_varnamesm
s�������$��i���(�(�C�/��S���L�L�D��+�+�D�1�D�!�$�s��)�"2�"2�4�>�>�"B�F�Q�J�O�D��I�I�d�O�!�	
���$�)�)�C�L�L�N�J�?�@r�c��|jd�|jdkrdnd}|j|jd|z|jdzzd��y)NrFr<r�r�rrB)r0r:r]r.r�)r�	sort_sizes  rWr�zStataWriter117._write_sortlistx
sO������$��*�*�S�0�A�a�	����$�)�)�G�i�$7�4�9�9�q�=�$I�:�V�Wr�c�H�|jd�t�}|jdk(rdnd}|jD]6}|j	t|j
|j�|���8|j|j|j�d��y)NrGr;r~r|)r0rr:r�r%rrrr]r.r))rr+�fmt_lenr�s    rWr�zStataWriter117._write_formats}
sx������#��i���)�)�S�0�"�b���<�<�C��I�I�n�S�Z�Z����%?��I�J� ����$�)�)�C�L�L�N�I�>�?r�c���|jd�t�}|jdk(rdnd}t|j�D]o}d}|j
|r|j|}|j|�}t|ddj|j�|dz�}|j|��q|j|j|j�d��y)NrHr;r!r#r�rB)r0rr:r(r�rRr:r�rrrr%r]r.r))rr+�vl_lenr r��encoded_names      rWr�z'StataWriter117._write_value_label_names�
s������,�-��i���(�(�C�/��S���t�y�y�!�A��D��%�%�a�(��|�|�A����+�+�D�1�D�)�$�s��)�*:�*:�4�>�>�*J�F�UV�J�W�L��I�I�l�#�"�	
���$�)�)�C�L�L�N�4G�H�Ir�c�$�|jd�t�}|jdk(rdnd}td|dz�}|j�[t|j�D]}|j|��|j|j|j�d��y|jD]�}||jvrc|j|}t|�dkDrtd��	|j|j�}|jt||dz���t|j|���|j|j|j�d��y#t $r}td|j���|�d}~wwxYw)	Nrr;r�i@r�rBr�zDVariable labels must contain only characters that can be encoded in )r0rr:rrir(r�r%r]r.r)r�rrfrr�UnicodeEncodeError)	rr+rUr�rrr�r
r�rts	         rWr�z%StataWriter117._write_variable_labels�
sc�����*�+��i���(�(�C�/��S���r�6�A�:�.��� � �(��4�9�9�%���	�	�%� �&����d�i�i�����8I�J�K���9�9�C��d�+�+�+��-�-�c�2���u�:��?�$�%U�V�V��#�l�l�4�>�>�:�G��	�	�.��&�1�*�=�>��	�	�%� �� 	
���$�)�)�C�L�L�N�4E�F�G��*��$�-�-1�^�^�,<�>������s�+E(�(	F�1F
�
Fc�h�|jd�|j|jdd��y)NrIr�)r0r]r.rVs rWr�z%StataWriter117._write_characteristics�
s+�����*�+����$�)�)�C�):�;�<r�c��|jd�|jd�|j|j��|jd�y)Nr�s<data>s</data>)r0r]r�r�s  rWr�zStataWriter117._write_data�
sA������ ����)�$����'�/�/�+�,����*�%r�c�|�|jd�|j|j|jd��y)NrJ)r0r]r.r+rVs rWr�zStataWriter117._write_strls�
s-������!����$�)�)�D�O�O�W�=�>r�c��y)zNo-op in dta 117+Nr:rVs rWr�z&StataWriter117._write_expansion_fields�
rnr�c�6�|jd�t�}|jD]@}|j|j�}|j|d�}|j
|��B|j|j|j�d��y)Nr	�lbl)	r0rrQr0r!r.r%r]r))rr+r�labs    rWr�z"StataWriter117._write_value_labels�
sw������(��i���$�$�B��)�)�$�/�/�:�C��)�)�C��'�C��I�I�c�N�%�	
���$�)�)�C�L�L�N�N�C�Dr�c�~�|jd�|jtdd��|jd�y)NrKz</stata_dta>rrL)r0r]r5rVs rWr�z$StataWriter117._write_file_close_tag�
s4�����+�,����%���8�9�����'r�c��|jj�D]>\}}||jvs�|jj|�}||j|<�@y)z�
        Update column names for conversion to strl if they might have been
        changed to comply with Stata naming rules
        N)rTrAr�rK)r�orig�newr�s    rWrmz!StataWriter117._update_strl_names�
sU���.�.�4�4�6�I�D�#��t�)�)�)��(�(�.�.�t�4��*-��"�"�3�'�7r�c��t|�D��cgc]'\}}|j|dk(s||jvr|��)}}}|rCt|||j��}|j�\}}|}|j
|�|_|Scc}}w)zg
        Convert columns to StrLs if either very large or in the
        convert_strl variable
        ryr=)rrpr�rr:rr&r+)rr�r r��convert_cols�ssw�tab�new_datas        rWr�zStataWriter117._convert_strls�
s���$�D�/�
�)���3��|�|�A��%�'�3�$�2D�2D�+D�
�)�	�
��!�$��d�>O�>O�P�C��.�.�0�M�C���D�!�/�/��4�D�O����
s�,Bc�T�g|_g|_|j�D]�\}}||jv}t	||j
||j|��}|jj|�|jjt||j
||����y)N)rErF)	rpr�rAr�rHr�r:rr�)rr�r�rvrFr�s      rWr�z%StataWriter117._set_formats_and_types�
s��������� �,�,�.�J�C���� 2� 2�2�J�-���	�	�#�� �-�-�%�	�C�
�L�L����$��L�L���(���	�	�#��
�K�
�)r�)	NTNNNNNrN)rJr�r�r$r�r�rXr�r*rrr�rrrr�r��Sequence[Hashable] | Noner�r.r7rr	r�r�r3)r�str | bytesr-rr�r5)r-rr�r3r�r�r4r!r�)r6r7r8r9r�r:rr�r.r0r�r�r�r�r�r�r�r�r�r�r�r�r�r�rmr�r�r&r's@rWr(r(�sc���S�j���L�59� � $�&*�!%�6:�26�*1�15�#�AE�#�,�#��#�2�	#�
�#��
#�$�#��#�4�#�0�#�(�#�/�#�>�#�
�#�J�X��X�4�"&�&*�8?��8?�$�8?�
�	8?�t<�<G�	A�X�
@�
J�H�@=�&�?� �E�(�
	.��$r�r(c���eZdZUdZdZded<										d	dd�																											d
�fd�Zdd�Z�xZS)�StataWriterUTF8u�
    Stata binary dta file writing in Stata 15 (118) and 16 (119) formats

    DTA 118 and 119 format files support unicode string data (both fixed
    and strL) format. Unicode is also supported in value labels, variable
    labels and the dataset label. Format 119 is automatically used if the
    file contains more than 32,767 variables.

    Parameters
    ----------
    fname : path (string), buffer or path object
        string, path object (pathlib.Path or py._path.local.LocalPath) or
        object implementing a binary write() functions. If using a buffer
        then the buffer will not be automatically closed after the file
        is written.
    data : DataFrame
        Input to save
    convert_dates : dict, default None
        Dictionary mapping columns containing datetime types to stata internal
        format to use when writing the dates. Options are 'tc', 'td', 'tm',
        'tw', 'th', 'tq', 'ty'. Column can be either an integer or a name.
        Datetime columns that do not have a conversion type specified will be
        converted to 'tc'. Raises NotImplementedError if a datetime column has
        timezone information
    write_index : bool, default True
        Write the index to Stata dataset.
    byteorder : str, default None
        Can be ">", "<", "little", or "big". default is `sys.byteorder`
    time_stamp : datetime, default None
        A datetime to use as file creation date.  Default is the current time
    data_label : str, default None
        A label for the data set.  Must be 80 characters or smaller.
    variable_labels : dict, default None
        Dictionary containing columns as keys and variable labels as values.
        Each label must be 80 characters or smaller.
    convert_strl : list, default None
        List of columns names to convert to Stata StrL format.  Columns with
        more than 2045 characters are automatically written as StrL.
        Smaller columns can be converted by including the column name.  Using
        StrLs can reduce output file size when strings are longer than 8
        characters, and either frequently repeated or sparse.
    version : int, default None
        The dta version to use. By default, uses the size of data to determine
        the version. 118 is used if data.shape[1] <= 32767, and 119 is used
        for storing larger DataFrames.
    {compression_options}

        .. versionchanged:: 1.4.0 Zstandard support.

    value_labels : dict of dicts
        Dictionary containing columns as keys and dictionaries of column value
        to labels as values. The combined length of all labels for a single
        variable must be 32,000 characters or smaller.

        .. versionadded:: 1.4.0

    Returns
    -------
    StataWriterUTF8
        The instance has a write_file method, which will write the file to the
        given `fname`.

    Raises
    ------
    NotImplementedError
        * If datetimes contain timezone information
    ValueError
        * Columns listed in convert_dates are neither datetime64[ns]
          or datetime
        * Column dtype is not representable in Stata
        * Column listed in convert_dates is not in DataFrame
        * Categorical label contains more than 32,000 characters

    Examples
    --------
    Using Unicode data and column names

    >>> from pandas.io.stata import StataWriterUTF8
    >>> data = pd.DataFrame([[1.0, 1, 'ᴬ']], columns=['a', 'β', 'ĉ'])
    >>> writer = StataWriterUTF8('./data_file.dta', data)
    >>> writer.write_file()

    Directly write a zip file
    >>> compression = {"method": "zip", "archive_name": "data_file.dta"}
    >>> writer = StataWriterUTF8('./data_file.zip', data, compression=compression)
    >>> writer.write_file()

    Or with long strings stored in strl format

    >>> data = pd.DataFrame([['ᴀ relatively long ŝtring'], [''], ['']],
    ...                     columns=['strls'])
    >>> writer = StataWriterUTF8('./data_file_with_long_strings.dta', data,
    ...                          convert_strl=['strls'])
    >>> writer.write_file()
    rzLiteral['utf-8']rNrMc
����|
�|jddkrdnd}
n1|
dvrtd��|
dk(r|jddkDrtd��t�|�
|||||||||
|	||��|
|_y)	NrBi�rr<)rr<z"version must be either 118 or 119.zKYou must use version 119 for data sets containing more than32,767 variables)
r�rXr*rrrr	r�r�r7)r�rfr�rr:)rrJr�r�rXr*rrrr�r>r�r7r	r�s              �rWrzStataWriterUTF8.__init__ds����"�?�!�Z�Z��]�e�3�c��G�
�J�
&��A�B�B�
��^��
�
�1�
�� 5��#��
�
	�����'�#��!�!�+�%�%�#�+�	�
	
�$��r�c���|D]`}t|�dkr#|dks|dkDr|dks|dkDr|dks|dkDr|dk7sdt|�cxkrd	ksn|d
vs�O|j|d�}�b|S)a�
        Validate variable names for Stata export.

        Parameters
        ----------
        name : str
            Variable name

        Returns
        -------
        str
            The validated name with invalid characters replaced with
            underscores.

        Notes
        -----
        Stata 118+ support most unicode characters. The only limitation is in
        the ascii range where the characters supported are a-z, A-Z, 0-9 and _.
        r#rprqrrrsr�rtrr�>�×�÷)r�rurvs   rWrwz'StataWriterUTF8._validate_variable_name�sy��*�A���F�S�L��S��A��G��S��A��G��S��A��G��S���#�a�&�&�3�&���$��|�|�A�s�+����r�)
NTNNNNNNrN)rJr�r�r$r�r�rXr�r*rrr�rrrr�r�rjr>rr�r.r7rr	r�r�r3r�)	r6r7r8r9rrhrrwr&r's@rWrmrms����^�@#*�I��)�59� � $�&*�!%�6:�26�"�*1�15�*$�AE�*$�,�*$��*$�2�	*$�
�*$��
*$�$�*$��*$�4�*$�0�*$��*$�(�*$�/�*$�>�*$� 
�!*$�X#r�rm)r�r'r�rr�r'r!)r6rr�r�r�r�r�rr�r�r�r�r�rr�r�r�rr(r�r�r.r7rr�zDataFrame | StataReader)r0rr�r)r�r
r�r`r�r
)r�rr�rg)r�r�r:zlist[Hashable]r�r�)rvrgrr'r�r`)r�F)rr'rEr`rFr�r�r)rvrgrr'rFr�r�r`)r�rkr�r`r�r5)vr9�
__future__r�collectionsrrr�iorr�r�r��typingrr	r
rrr
r��numpyr��pandas._libsr�pandas._libs.libr�pandas._libs.writersr�
pandas.errorsrrrr�pandas.util._decoratorsrr�pandas.util._exceptionsr�pandas.core.dtypes.baser�pandas.core.dtypes.commonrrr�pandas.core.dtypes.dtypesr�pandasrrrr r!r"r#�pandas.core.framer$�pandas.core.indexes.baser%�pandas.core.indexes.ranger&�pandas.core.seriesr'�pandas.core.shared_docsr(�pandas.io.commonr)�collections.abcr*r+�typesr,r-�pandas._typingr.r/r0r1r2r3rK�_statafile_processing_params1�_statafile_processing_params2�_chunksize_params�_iterator_params�
_reader_notes�_read_stata_docr%r$r�rCrhr�r�r�r�r�r�r�r�r�r<r�rp�Iteratorr�r+r�r'r6r<rBrHrLr�rrr(rmr:r�rW�<module>r�s��
�#����	�
�
������(�5����5�2���
7����(�*�0�%�0�'���$����4��!N��!G��$��
"��(�
���� ��� �������
�%�&�)=�=�>�?�
�� �!�"����?8��t��� ��� �
����� ��� ����
�%�&�'�(�
�� �!�"������$O�
��d�A�q�)��U�)�]�@h7�V(��u����E��#��%��	��%�	�)����y�xv�v�r%�O�%�>C�C�L\
�\
�~|&�+�s�|�|�|&�~
�/���!%� �!� �$(�#� ��&-�-1�!�4�!��!��	!�
�!��
!��!�"�!��!��!��!�$�!�+�!��!��!�HD�0�C�2�!G�JGL�*G��*G�(+�*G�?C�*G��*G�Z� �!2�3�$�%:�;�g�E��KB�+�KB�	�KB�\'G�T1�r�r�jz�[�z�zr�n�rr�

Sindbad File Manager Version 1.0, Coded By Sindbad EG ~ The Terrorists