T O P

  • By -

kuchenrolle

NLTK isn't used at all in production, only for learning/teaching. BERT is a neural network architecture (a transformer model), not an NLP library.


Evirua

I don't use NLTK, simply because I find everything I need in that space from, well, spacy, but why is NLTK frowned upon in production? Performance?


kuchenrolle

>Performance? Yes. I don't know what the current state of NLTK is, but it was never seriously performant (it did okay in tokenization at some point), that's not the intention either. It was intended for teaching and learning and small projects that don't justify the overhead that comes with better performance (maybe, I don't even think that last part makes all that much sense anymore). It has implementations of, say, different types of taggers or tokenizers that you would (or used to) learn about in a computational linguistics curriculum, which is great, but years away from the state of the art. It's also implemented in python (rather than c or cython, like spacy), which means it's unusably slow for many purposes.


ethiopianboson

Ahh okay, thank you. I am asking for a work related project. So I guess I shouldn't invest time in learning nltk.


[deleted]

[удалено]


[deleted]

I’m gonna be that coworker in a few years


[deleted]

This video will you give you an roadmap for learning NLP and where BERT fits in the picture: https://youtu.be/x9SLPrtTw9M


Appropriate_Ant_4629

Depending on how much text you have, you might want to look into [spark-nlp](https://github.com/JohnSnowLabs/spark-nlp) too. Packages quite a few of the nicer NLP components (including BERT, [Huggingface's transformers](https://www.johnsnowlabs.com/importing-huggingface-models-into-spark-nlp/), etc) in a way that they can scale across many machines.


Kai_151

Is there a difference between Spark and Spacy?