<output id="qn6qe"></output>

    1. <output id="qn6qe"><tt id="qn6qe"></tt></output>
    2. <strike id="qn6qe"></strike>

      亚洲 日本 欧洲 欧美 视频,日韩中文字幕有码av,一本一道av中文字幕无码,国产线播放免费人成视频播放,人妻少妇偷人无码视频,日夜啪啪一区二区三区,国产尤物精品自在拍视频首页,久热这里只有精品12

      ArmRoundMan

      博客園 首頁 新隨筆 聯系 訂閱 管理

      DataFrames are the central data structure in the pandas API. It‘s like a spreadsheet, with numbered rows and named columns.

      為方便引入例程,先導入對應模塊。

      1 import pandas as pd
      View Code

      1. Create, access and modify.

      Read a .csv file into a pandas DataFrame:

      chicago_taxi_dataset = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/chicago_taxi_train.csv")
      View Code

      Basic argument of  read_csv() :

      • filepath_or_buffervarious

      Either a path to a file (a strpathlib.Path, or  py:py._path.local.LocalPath ), URL (including http, ftp, and S3 locations), or any object with a  read()  method (such as an open file or StringIO).

      The following code instantiates a  pd.DataFrame  class to generate a DataFrame.

       1 # Create and populate a 5x2 NumPy array.
       2 my_data = np.array([[0, 3], [10, 7], [20, 9], [30, 14], [40, 15]])
       3 
       4 # Create a Python list that holds the names of the two columns.
       5 my_column_names = ['temperature', 'activity']
       6 
       7 # Create a DataFrame.
       8 my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names)
       9 
      10 # Print the entire DataFrame
      11 print(my_dataframe)
      View Code

      See its  .index  attribute, the result is

      RangeIndex(start=0, stop=5, step=1)

      If index argument was passed at the definition, say modifying line 8 of the above code to

      my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names,index = ([10,20,30,40,50]))
      View Code

      the index attribute is

      Index([10, 20, 30, 40, 50], dtype='int64')

       len(my_dataframe.index)  counts the number of rows of the DataFrame.

      DataFrame.reset_index(level=None*drop=Falseinplace=Falsecol_level=0col_fill=''allow_duplicates=_NoDefault.no_defaultnames=None) method resets the index, where

      • level: int, str, tuple, or list

        Only remove the given levels from the index. Removes all levels by default.

      • drop: bool, try to insert the original index into dataframe columns if False.
      • inplace: bool, whether to modify the DataFrame rather than creating a new one.
      • names: int, str or 1-dimensional list

        Using the given string, rename the DataFrame column which contains the original index data. If the DataFrame has a MultiIndex, this has to be a list or tuple with length equal to the number of levels.

      It returns a new DataFrame or None if inplace=True.

      DataFrame.set_index(keys*drop=Trueappend=Falseinplace=Falseverify_integrity=False) does reversely, where

      • keys: label or array-like or list of labels/arrays. This parameter can be either a single column key, or other types described HERE.
      • drop: bool, whether to delete columns to be used as the new index.
      • append: bool, whether to append columns to existing index or just replace the original index. For more information, click HERE
      • inplace: bool, the same as the one above.

       

       

      You may add a new column to an existing pandas DataFrame just by assigning values to a new column name.

      1 # Create a new column named adjusted.
      2 my_dataframe["adjusted"] = my_dataframe["activity"] + 2
      3 # Print the entire DataFrame
      4 print(my_dataframe)
      View Code

      Pandas provide multiples ways to isolate specific rows, columns, slices or cells in a DataFrame.

       1 print(my_dataframe.activity) #Equal to the corresponding column selection above
       2 print("Rows #0, #1, and #2:")
       3 print(my_dataframe.head(3), '\n')
       4 
       5 print("Row #2:")
       6 print(my_dataframe.iloc[[2]], '\n') # The type of result is DataFrame.
       7 print("Row #2:")
       8 print(my_dataframe.iloc[2], '\n') # The type of the result is Series.
       9 print("Rows #1, #2, and #3:")
      10 print(my_dataframe[1:4], '\n') # Note the index starts from the second row not
      11 # 1st
      12 
      13 print("Column 'temperature':")
      14 print(my_dataframe['temperature'])
      15 
      16 training_df = chicago_taxi_dataset[['TRIP_MILES', 'TRIP_SECONDS', 'FARE', 'COMPANY', 'PAYMENT_TYPE', 'TIP_RATE']
      17 training_df.head(200)
      View Code

       

      To get random samples of a DataFrame, use DataFrame.sample(n=Nonefrac=Nonereplace=Falseweights=Nonerandom_state=Noneaxis=Noneignore_index=False) method, where

      • n: int, optional

        Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

      • frac: float, optional

        Fraction of axis items to return. Cannot be used with n.

      • replace: bool, whether to allow sampling of the same row more than once. Supposed to be set to True if frac > 1.
      • random_state: If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.
      • axis: {0 or ‘index’, 1 or ‘columns’, None}, default is stat axis for given data type.
      • ignore_index: bool, deciding if the resulting index will be labeled 0, 1, …, n-1.

      It returns a new object of same type as caller containing n items.

       

      Q: What's the difference between Series and DataFrame? 

      A: The former is a column(Google Gemini insists row but I don't know why) of the latter.

      How to index a particular cell of the DataFrame?

       1 # Create a Python list that holds the names of the four columns.
       2 my_column_names = ['Eleanor', 'Chidi', 'Tahani', 'Jason']
       3 
       4 # Create a 3x4 numpy array, each cell populated with a random integer.
       5 my_data = np.random.randint(low=0, high=101, size=(3, 4))
       6 
       7 # Create a DataFrame.
       8 df = pd.DataFrame(data=my_data, columns=my_column_names)
       9 
      10 # Print the entire DataFrame
      11 print(df)
      12 
      13 # Print the value in row #1 of the Eleanor column.
      14 print("\nSecond row of the Eleanor column: %d\n" % df['Eleanor'][1]) #Chained # indexing
      View Code

      Q: How to convert a Series to ndarray?

      A: Series.values property returns Series as ndarray or ndarray-like depending on the dtype. However, it is recommended using Series.array or Series.to_numpy(), depending on whether you need a reference to the underlying data or a NumPy array.

      The following code shows how to create a new column to an existing DataFrame through row-by-row calculation between or among columns:

      1 # Create a column named Janet whose contents are the sum
      2 # of two other columns.
      3 df['Janet'] = df['Tahani'] + df['Jason']
      4  
      5 # Print the enhanced DataFrame
      6 print(df)
      View Code

      Pandas provides two different ways to duplicate a DataFrame:

      • Referencing: 藕不斷絲連。
      • Copying: 相互獨立。

       

       1 # Create a reference by assigning my_dataframe to a new variable.
       2 print("Experiment with a reference:")
       3 reference_to_df = df
       4 
       5 # Print the starting value of a particular cell.
       6 print("  Starting value of df: %d" % df['Jason'][1])
       7 print("  Starting value of reference_to_df: %d\n" % reference_to_df['Jason'][1])
       8 
       9 # Modify a cell in df.
      10 df.at[1, 'Jason'] = df['Jason'][1] + 5 # Why not using Chained Indexing for #DataFrame assignment?
      11 print("  Updated df: %d" % df['Jason'][1])
      12 print("  Updated reference_to_df: %d\n\n" % reference_to_df['Jason'][1])
      View Code

      There're a lot of differences among  .iloc 

      Note: Deprecated since version 2.2.0: Returning a tuple from a callable is deprecated for  pandas.DataFrame.iloc . Better not input a callable function.、 

      Confusion Solutions What's the difference among  .at .loc .iloc.iat  and chained indexing?    

       .iloc   is primarily integer position based (from 0 to  length-1  of the axis): Allowed inputs are  

      • An integer.

      • A list or array of integers.

      • A slice object with ints, e.g. 1:7.

      • A boolean array. 

      • A tuple of row and column indexes. The tuple elements consist of one of the above inputs, e.g. (0, 1).

      but may also be used with a boolean array. 

      To make a copy of a DataFrame, use DataFrame.copy(deep=True) method. When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object. Things might be a lit more complicated in terms of deep=False(This inline label is a link, too) after pandas 3.0.

       

      The following code shows an experiment of a copy

      1 copy_of_my_dataframe = my_dataframe.copy()
      View Code

        DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')  drops specified labels from rows or columns, where

      • labels: single label or list-like

        Index or column labels to drop. A tuple will be used as a single label and not treated as a list-like.

      •  axis: {0 or ‘index’, 1 or ‘columns’}, whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).

      • index: single label or list-like

        Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).

      •  columns: single label or list-like

        Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

      •  level: int or level name, optional

        For MultiIndex, level from which the labels will be removed.

      •  inplace: bool, whether do operation in place and return None or return a copy.

      • errors: {‘ignore’, ‘raise’}. If ‘ignore’, suppress error and only existing labels are dropped.

       It returns DataFrame with the specified index or column labels removed or None.

       

      2. Data exploration.

      To preview first n rows of a large DataFrame, use  DataFrame.head(n=5)  and it returns the same type as caller.

      use the  DataFrame.describe(percentiles=Noneinclude=Noneexclude=None)  method to view descriptive statistics about the dataset, where

      • percentiles: list-like of numbers, optional. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles.
      • include: ‘all’, list-like of dtypes or None, optional. A white list of data types to include in the result. Ignored for  Series . 
        • ‘all’ : All columns of the input will be included in the output.

       training_df.describe(include='all')  results in

        TRIP_MILES TRIP_SECONDS FARE COMPANY PAYMENT_TYPE TIP_RATE
      count 31694.000000 31694.000000 31694.000000 31694 31694 31694.000000
      unique NaN NaN NaN 31 7 NaN
      top NaN NaN NaN Flash Cab Credit Card NaN
      freq NaN NaN NaN 7887 14142 NaN
      mean 8.289463 1319.796397 23.905210 NaN NaN 12.965785
      std 7.265672 928.932873 16.970022 NaN NaN 15.517765
      min 0.500000 60.000000 3.250000 NaN NaN 0.000000
      25% 1.720000 548.000000 9.000000 NaN NaN 0.000000
      50% 5.920000 1081.000000 18.750000 NaN NaN 12.200000
      75% 14.500000 1888.000000 38.750000 NaN NaN 20.800000
      max 68.120000 7140.000000 159.250000 NaN NaN 648.600000
      # How many cab companies are in the dataset? Try DataFrame.nunique(axis=0dropna=True) method, where
      • axis: {0 or ‘index’, 1 or ‘columns’}
      • dropna: bool, whether to exclude NaN in the counts.

      It returns Series. Series.nunique(dropna=True) method returns an int.

      查看代碼
       num_unique_companies =  training_df['COMPANY'].nunique()

       

      # What is the most frequent payment type?
      First, count the frequency of each distinct row in the Dataframe:  DataFrame.value_counts(subset=None, normalize=False, sort=True, ascending=False, dropna=True)
      , where
      • subset: label or list of labels of columns to use when counting unique combinations, optional.
      • normalize: bool, whether to return proportions rather than frequencies.
      • sort: bool. Sort by frequencies when True, otherwise by DataFrame column values(original order).
      • ascending: bool.
      • dropna: bool, whether to exclude rows containing NaN.

      It returns a Series. Series.value_counts(normalize=Falsesort=Trueascending=Falsebins=Nonedropna=True) method returns a Series containing counts of unique values, where

      • bins: int, optional. Number of bins.

        Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.

      Second, find the result with largest frequency by Series.idxmax(axis=0skipna=True*args**kwargs) method, where
      • axis: {0 or ‘index’} isn't used. It's needed for compatibility with DataFrame.
      • skipna: bool, whether to exclude NA/null values. If the entire Series is NA, the result will be NA.
      • *args, **kwargs: Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy. 

      Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

       It returns an Index, label of max.

      查看代碼
      most_freq_payment_type = training_df['PAYMENT_TYPE'].value_counts().idxmax()
      Often we want summary statistics of numerical features, before this we need to find all of the numerical columns. To do this, First, try  DataFrame.select_dtypes(include=None, exclude=None) ,where
      • include, exclude: scalar or list-like. For example, 
        • To select all numeric types, use np.number or 'number'

      It returns the subset of the DataFrame.

       Second, try  DataFrame.columns  to get labels of the subset DataFrame.
      # What is the maximum fare?
      Try  Series.max(axis=0, skipna=True, numeric_only=False, **kwargs) method, where
      • axis: {index (0)} is unsed, either.
      • skipna: bool, same as the above one.
      • numeric_only: not implemented.

      It returns scalar.

      查看代碼
       max_fare = training_df['FARE'].max()

       The same method for DataFrame is  DataFrame.mean(axis=0, skipna=True, numeric_only=False, **kwargs) , where

      • axis: {index (0), columns (1)}

        For DataFrames, specifying  axis=None  will apply the aggregation across both axes.

      • numeric_only: bool, whether to include only floatint or boolean data.
      To calculate standard deviation of all numerical features, try  DataFrame.std(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs) , where
      • ddof: int, Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1).
       It returns Series or DataFrame (if level specified). For Series, try  Series.std(axis=None, skipna=True, ddof=1, numeric_only=False, **kwargs)  instead.
       
      # What is the mean distance across all trips?
      Try  method Series.mean(axis=0skipna=Truenumeric_only=False**kwargs). Its return type and parameters are the same as above.
      查看代碼
       mean_distance = training_df['TRIP_MILES'].mean()
       
      # How many are the features missing examples?
      First, find all the features missing examples by isnull() method, 
      missing_values = training_df.isnull()
      View Code

      It returns a boolean same-sized DataFrame indicating if the values are NA.

      Second, count numbers of True by DataFrame.sum(axis=0skipna=Truenumeric_only=Falsemin_count=0**kwargs) method, where

      • axis: {index (0), columns (1)}
      • skipna: bool, whether to exclude NA/null values when computing the result.
      • numeric_only: bool, whether to include only float, int, boolean columns.
      • min_count: int, required number of valid values to perform the operation. If fewer than min_count non-NA values are present, the result will be NA.

      It returns a Series or scalar. The same method of Series is Series.sum(axis=Noneskipna=Truenumeric_only=Falsemin_count=0**kwargs), where

      • axis, numeric_only are the same as Series.max.

      查看代碼

       missing_values = training_df.isnull().sum().sum()

      To view correlation matrix among features(may including label), try DataFrame.corr(method='pearson'min_periods=1numeric_only=False)(Note that this inline code is also a link) method, where

      • method: of correlation:
        • pearson : standard correlation coefficient

        • kendall : Kendall Tau correlation coefficient

        • spearman : Spearman rank correlation

        • callable: callable with input two 1d ndarrays and returning a float. 
      • numeric_only: Same as the one in DataFrame.mean().

      It returns a DataFrame of correlation matrix.

      查看代碼

       training_df.corr(numeric_only = True)

      get the following result:

       TRIP_MILESTRIP_SECONDSFARETIP_RATE
      TRIP_MILES 1.000000 0.800855 0.975344 -0.049594
      TRIP_SECONDS 0.800855 1.000000 0.830292 -0.084294
      FARE 0.975344 0.830292 1.000000 -0.070979
      TIP_RATE -0.049594 -0.084294 -0.070979 1.000000

      4. Options and settings

      Options have a case-insensitive name. You can get/set options directly as attributes of the top-level  options  attribute:
      pd.options.display.max_rows = 10
      View Code
      If max_rows is exceeded, switch to truncate view. Depending on `large_repr`, objects are either centrally truncated or printed as a summary view. 'None' value means unlimited.
       display.float_format : callable The callable should accept a floating point number and return a string with the desired format of the number. For example,

      Format Specification Mini-Language: options.display.float_format = "{:.1f}".format 

       
      posted on 2024-08-22 20:08  后生那各膊客圓了  閱讀(62)  評論(0)    收藏  舉報
      主站蜘蛛池模板: 亚洲一本大道在线| 曰韩无码av一区二区免费| 大宁县| 久久天堂综合亚洲伊人HD妓女| 午夜通通国产精品福利| 精品国产粉嫩内射白浆内射双马尾| 亚洲成在人线AV品善网好看| 亚洲国产精品久久久久秋霞| 无码日韩精品一区二区三区免费| 日本免费人成视频在线观看| 奇米777四色在线精品| 在线a人片免费观看| 丝袜a∨在线一区二区三区不卡| 免费一区二三区三区蜜桃| 亚洲高清有码在线观看| 波多野结衣无内裤护士| 日韩区二区三区中文字幕| 国产精品亚洲А∨怡红院| 亚洲中文精品一区二区| 日韩av不卡一区二区在线 | 南通市| 熟妇的味道hd中文字幕| 国产成人午夜精品影院| 91久久久久无码精品露脸| 国产成人精品无码免费看夜聊软件| 欧美刺激性大交| 久久一区二区三区黄色片| 国产午夜福利在线视频| 狠狠躁夜夜躁人人爽天天bl| h无码精品3d动漫在线观看| 欧美性猛交xxxx乱大交丰满| 国语精品国内自产视频| 日韩av片无码一区二区不卡 | 精品国产中文字幕在线| 成人麻豆日韩在无码视频| 亚洲精品无码久久毛片| 激情自拍校园春色中文| 亚洲伊人久久大香线蕉| 开心五月激情综合久久爱| 精品偷拍被偷拍在线观看| 国内精品久久久久电影院|