From 7f3149d838ed897e8d1c9637fb1ae1aab5727bbf Mon Sep 17 00:00:00 2001 From: brad-gh <32170926+brad-gh@users.noreply.github.com> Date: Fri, 16 Sep 2022 01:18:53 +0200 Subject: [PATCH] add leakage intro --- projects/project4/README.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/projects/project4/README.md b/projects/project4/README.md index efcd024..602b353 100644 --- a/projects/project4/README.md +++ b/projects/project4/README.md @@ -23,12 +23,8 @@ The first file contains SP500 index data (OHLC: 4 time-series) and the other fil Note: Financial data can be complex and tricky to analyse for a lot of reasons. In order to focus on Time Series forecasting, the project gives access to a "simplified" financial dataset. For instance, we consider the composition of the SP500 remains similar over time which is not true and which introduces a "survivor bias". Plus, the data during covid-19 was removed because it may have a significant impact on the backtesting. -Note: Financial data can be complex and tricky to analyse for a lot of reasons. In order to focus on Time Series forecasting, the project gives access to a "simplified" financial dataset. For instance, we consider the composition of the SP500 remains similar over time which is not true and which introduces a "survivor bias". Plus, the data during covid-19 was removed because it may have a significant impact on the backtesting. - -Note: Financial data can be complex and tricky to analyse for a lot of reasons. In order to focus on Time Series forecasting, the project gives access to a "simplified" financial dataset. For instance, we consider the composition of the SP500 remains similar over time which is not true and which introduces a "survivor bias". Plus, the data during covid-19 was removed because it may have a significant impact on the backtesting. - -**"No leakage" small guide:** -We assume it is day D and we want to take a position on the next h days on the next day. The position starts on day D+1 (included). To decide wether we take a short or long position the return between day D+1 and D+2 is computed and used as a target. Finally, as features on day contain information until day D 11:59pm, target need to be shifted. As a result, the final dataframe schema is: +**"No leakage" [intro](https://en.wikipedia.org/wiki/Leakage_(machine_learning)) and small guide:** +We assume it is day D and we want to take a position on the next h days on the next day. The position starts on day D+1 (included). To decide wether we take a short or long position the return between day D+1 and D+2 is computed and used as a target. Finally, as features on day contain information until day D 11:59pm, target need to be shifted. As a result, the final dataframe schema is: | Index | Features | Target | | ------- | :------------------------: | ---------------: |