Efficient BERT: Finding Your Optimal Model with Multimetric Bayesian Optimization
AI News Sharing
Before BERT, each core NLP task had its own architecture and corpora for training a high performing model.
With the introduction of BERT, there was suddenly a strong performing, generalizable model that could be transferred to a variety of tasks. Essentially, BERT allows a variety of problems to share off-the-shelf pretrained models and moves NLP closer to standardization.
But BERT is really, really large. Many teams have compressed BERT to make the size manageable. These works focus on compressing the size of BERT for language understanding while retaining model performance.
Read Full Article Here