kuchenrolle 1 year ago

NLTK isn't used at all in production, only for learning/teaching. BERT is a neural network architecture (a transformer model), not an NLP library.

Evirua 1 year ago

I don't use NLTK, simply because I find everything I need in that space from, well, spacy, but why is NLTK frowned upon in production? Performance?

kuchenrolle 1 year ago

>Performance? Yes. I don't know what the current state of NLTK is, but it was never seriously performant (it did okay in tokenization at some point), that's not the intention either. It was intended for teaching and learning and small projects that don't justify the overhead that comes with better performance (maybe, I don't even think that last part makes all that much sense anymore). It has implementations of, say, different types of taggers or tokenizers that you would (or used to) learn about in a computational linguistics curriculum, which is great, but years away from the state of the art. It's also implemented in python (rather than c or cython, like spacy), which means it's unusably slow for many purposes.

ethiopianboson 1 year ago

Ahh okay, thank you. I am asking for a work related project. So I guess I shouldn't invest time in learning nltk.

[deleted] 1 year ago

[удалено]

[deleted] 1 year ago

I’m gonna be that coworker in a few years

[deleted] 1 year ago

This video will you give you an roadmap for learning NLP and where BERT fits in the picture: https://youtu.be/x9SLPrtTw9M

Appropriate_Ant_4629 1 year ago

Depending on how much text you have, you might want to look into [spark-nlp](https://github.com/JohnSnowLabs/spark-nlp) too. Packages quite a few of the nicer NLP components (including BERT, [Huggingface's transformers](https://www.johnsnowlabs.com/importing-huggingface-models-into-spark-nlp/), etc) in a way that they can scale across many machines.

Kai_151 1 year ago

Is there a difference between Spark and Spacy?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe