4.3 KiB
-
del
works but it is not a solution I recommend. For this exercise it is accepted. It is expected to usedrop
withaxis=1
.inplace=True
may be useful to avoid to affect the result to a variable. -
The preferred solution is
set_index
withinplace=True
. As long as the DataFrame returns the output below, the solution is accepted. If the type of the index is notdtype='datetime64[ns]'
the solution is not accepted.Input: df.head().index Output: DatetimeIndex(['2006-12-16', '2006-12-16','2006-12-16', '2006-12-16','2006-12-16'], dtype='datetime64[ns]', name='Date', freq=None)
-
The preferred solution is
pd.to_numeric
withcoerce=True
. The solution is accepted if all types arefloat64
.Input: df.dtypes Output: Global_active_power float64 Global_reactive_power float64 Voltage float64 Global_intensity float64 Sub_metering_1 float64 dtype: object
-
df.describe()
is expected -
You should have noticed that 25979 rows contain missing values (for a total of 129895).
df.isna().sum()
allows to check the number of missing values anddf.dropna()
withinplace=True
allows to remove the rows with missing values. The solution is accepted if you useddropna
and have the number of missing values as 0. -
Two solutions are accepted:
df.loc[:,'A'] = (df['A'] + 1) * 0.06
- Using
apply
:df.loc[:,'A'] = df.loc[:,'A'].apply(lambda x: (x+1)*0.06)
You may wonder
df.loc[:,'A']
is required and ifdf['A'] = ...
works too. The answer is no. This is important in Pandas. Depending on the version of Pandas, it may return a warning. The reason is that you are affecting a value to a copy of the DataFrame and not in the DataFrame. More details: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas -
The solution is accepted as long as the output of
print(filtered_df.head().to_markdown())
is:Date Global_active_power Global_reactive_power 2008-12-27 00:00:00 0.996 0.066 2008-12-27 00:00:00 1.076 0.162 2008-12-27 00:00:00 1.064 0.172 2008-12-27 00:00:00 1.07 0.174 2008-12-27 00:00:00 0.804 0.184 Check that the number of rows is equal to 449667.
-
The solution is accepted if output is
Global_active_power 0.254 Global_reactive_power 0.000 Voltage 238.350 Global_intensity 1.200 Sub_metering_1 0.000 Name: 2007-02-16 00:00:00, dtype: float64
-
The solution is accepted if the output is
Timestamp('2009-02-22 00:00:00')
-
The solution is accepted if the output for
print(sorted_df.tail().to_markdown())
isDate Global_active_power Global_reactive_power Voltage 2008-08-28 00:00:00 0.076 0 234.88 2008-08-28 00:00:00 0.076 0 235.18 2008-08-28 00:00:00 0.076 0 235.4 2008-08-28 00:00:00 0.076 0 235.64 2008-12-08 00:00:00 0.076 0 236.5 -
The solution is based on
groupby
which creates groups based on the indexDate
and aggregates the groups using themean
. The solution is accepted if the output isDate 2006-12-16 3.053475 2006-12-17 2.354486 2006-12-18 1.530435 2006-12-19 1.157079 2006-12-20 1.545658 ... 2010-12-07 0.770538 2010-12-08 0.367846 2010-12-09 1.119508 2010-12-10 1.097008 2010-12-11 1.275571 Name: Global_active_power, Length: 1433, dtype: float64