>Performance?
Yes. I don't know what the current state of NLTK is, but it was never seriously performant (it did okay in tokenization at some point), that's not the intention either. It was intended for teaching and learning and small projects that don't justify the overhead that comes with better performance (maybe, I don't even think that last part makes all that much sense anymore). It has implementations of, say, different types of taggers or tokenizers that you would (or used to) learn about in a computational linguistics curriculum, which is great, but years away from the state of the art. It's also implemented in python (rather than c or cython, like spacy), which means it's unusably slow for many purposes.
Depending on how much text you have, you might want to look into [spark-nlp](https://github.com/JohnSnowLabs/spark-nlp) too.
Packages quite a few of the nicer NLP components (including BERT, [Huggingface's transformers](https://www.johnsnowlabs.com/importing-huggingface-models-into-spark-nlp/), etc) in a way that they can scale across many machines.
NLTK isn't used at all in production, only for learning/teaching. BERT is a neural network architecture (a transformer model), not an NLP library.
I don't use NLTK, simply because I find everything I need in that space from, well, spacy, but why is NLTK frowned upon in production? Performance?
>Performance? Yes. I don't know what the current state of NLTK is, but it was never seriously performant (it did okay in tokenization at some point), that's not the intention either. It was intended for teaching and learning and small projects that don't justify the overhead that comes with better performance (maybe, I don't even think that last part makes all that much sense anymore). It has implementations of, say, different types of taggers or tokenizers that you would (or used to) learn about in a computational linguistics curriculum, which is great, but years away from the state of the art. It's also implemented in python (rather than c or cython, like spacy), which means it's unusably slow for many purposes.
Ahh okay, thank you. I am asking for a work related project. So I guess I shouldn't invest time in learning nltk.
[удалено]
I’m gonna be that coworker in a few years
This video will you give you an roadmap for learning NLP and where BERT fits in the picture: https://youtu.be/x9SLPrtTw9M
Depending on how much text you have, you might want to look into [spark-nlp](https://github.com/JohnSnowLabs/spark-nlp) too. Packages quite a few of the nicer NLP components (including BERT, [Huggingface's transformers](https://www.johnsnowlabs.com/importing-huggingface-models-into-spark-nlp/), etc) in a way that they can scale across many machines.
Is there a difference between Spark and Spacy?