Browse Source

feat(nlp-scraper): add link to datasets provided

pull/2435/head
nprimo 4 months ago committed by Niccolò Primo
parent
commit
375bb5c1fb
  1. 3
      subjects/ai/nlp-scraper/README.md
  2. 0
      subjects/ai/nlp-scraper/bbc_news_tests.csv
  3. 0
      subjects/ai/nlp-scraper/bbc_news_train.csv

3
subjects/ai/nlp-scraper/README.md

@ -50,7 +50,8 @@ https://towardsdatascience.com/named-entity-recognition-with-nltk-and-spacy-8c4a
### **2. Topic detection:**
The goal is to detect what the article is dealing with: Tech, Sport, Business,
Entertainment or Politics. To do so, a labelled dataset is provided. From this
Entertainment or Politics. To do so, a labelled dataset is provided: [training
data](bbc_news_train.csv) and [test data](bbc_news_test.csv). From this
dataset, build a classifier that learns to detect the right topic in the
article. The trained model should be stored as `topic_classifier.pkl`. Make
sure the model can be used easily (with the preprocessing pipeline built for

0
subjects/ai/nlp-scraper/BBC News Test.csv → subjects/ai/nlp-scraper/bbc_news_tests.csv

error.csv.too_large

0
subjects/ai/nlp-scraper/BBC News Train.csv → subjects/ai/nlp-scraper/bbc_news_train.csv

error.csv.too_large
Loading…
Cancel
Save