diff --git a/subjects/ai/nlp/README.md b/subjects/ai/nlp/README.md index b6a5d3409..43b14f971 100644 --- a/subjects/ai/nlp/README.md +++ b/subjects/ai/nlp/README.md @@ -95,7 +95,7 @@ The goal of this exercise is to learn to deal with punctuation. In Natural Langu # Exercise 3: Tokenization -The goal of this exercise is to learn to tokenize as text. This step is important because it splits the text into token. A token could be a sentence or a word. +The goal of this exercise is to learn [to tokenize](https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization) as text. This step is important because it splits the text into token. A token could be a sentence or a word. ``` text = """Bitcoin is a cryptocurrency invented in 2008 by an unknown person or group of people using the name Satoshi Nakamoto. The currency began use in 2009 when its implementation was released as open-source software.""" @@ -106,8 +106,6 @@ text = """Bitcoin is a cryptocurrency invented in 2008 by an unknown person or g 2. Tokenize this text using `word_tokenize` from NLTK. -_Resources: [How to Get Started with NLP – 6](https://www.analyticsvidhya.com/blog/2019/07how-get-started-nlp-6-unique-ways-perform-tokenization/)_ - --- ---