Using Bert without training?

Using Bert without training?#

It is possible. In fact, Facebook AI Research (FAIR) published papers on using it completely un-trained, randomly initialized, proving that the structure of transformer itself is enough to extract information (to some extent).

However, the appeal of Bert is its readily available pretrained models, and using it without training it first (or train it yourself) kind of defeats the purpose.

Bert is essentially a building block for your model, the idea behind Bert is that essentially, you can add very few layers (one linear layer achieved 85% in spam classification), and get a very good model, without training a lot. So, except the case when you are FAIR (Facebook AI Research), which released several papers about feature-extraction of completely untrained model, you would want to use a pre-trained version of Bert.