You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.5 KiB

The for loop is forbidden in this exercise. The goal is to use groupby and apply.

  1. This question is validated if the output is:

        df = pd.DataFrame(range(1,11), columns=['sequence'])
        print(winsorize(df, [0.20, 0.80]).to_markdown())
    
    sequence
    0 2.8
    1 2.8
    2 3
    3 4
    4 5
    5 6
    6 7
    7 8
    8 8.2
    9 8.2
  2. This question is validated if the output is the same as the one returned by:

    def winsorize(df_series, quantiles):
    """
        df: pd.DataFrame or pd.Series
        quantiles: list [0.05, 0.95]
    
    """
    min_value = np.quantile(df_series, quantiles[0])
    max_value = np.quantile(df_series, quantiles[1])
    
    return df_series.clip(lower = min_value, upper = max_value)
    
    
    df.groupby("group")[['sequence']].apply(winsorize, [0.05,0.95])
    

    The output can also be a Series instead of a DataFrame.

    The expected output (first rows) is:

    sequence
    0 1.45
    1 2
    2 3
    3 4
    4 5
    5 6
    6 7
    7 8
    8 9
    9 9.55
    10 11.45