Improving Language Representation Learning via Sentence Ordering Prediction

📆 6/1/2024 4:41 PM
📰 hackernoon

⏱ Reading Time:
87 sec. here
3 min. at publisher
📊 Quality Score:
News: 38%
Publisher: 51%

Education Education Headlines News

Education Education Latest News,Education Education Headlines

Prior work has found that the next sentence prediction loss used for pretraining is ineffective in improving downstream task performance.

Author: Mingda Chen. Table of Links Abstract Acknowledgements 1 INTRODUCTION 1.1 Overview 1.2 Contributions 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.

architecture. configurations that still have fewer parameters than architecture is similar to , we use a sentence ordering prediction loss, which avoids topic prediction and instead focuses on modeling inter-sentence coherence. The SOP constructs positive examples the same as achieves state-of-the-art results under two settings: single-model and ensembles. When ensembling models, for GLUE benchmark and RACE, we average the model predictions for the ensemble models, where the candidates are fine-tuned from different training steps using the 12-layer and 24-layer architectures. For SQuAD, we average the prediction scores for those spans that have multiple probabilities; we also average the scores of the “unanswerable” decision.

architecture that has significantly fewer parameters than a traditional paper. As a result of these design choices, we are able to scale up to much larger uses a loss based on predicting whether the second segment in a pair has been swapped with a segment from another document. We compare to this loss in our experiments and find that sentence ordering is a more challenging pretraining task and more useful for certain downstream tasks. Concurrently to this work, Wang et al.

is trained jointly by MLM and NSP. NSP is a binary classification task for predicting whether two segments appear consecutively in the original text. Positive examples are created by taking consecutive segments from the training corpus. Negative examples are created by pairing segments from different documents. Positive and negative examples are sampled with equal probability. Later studies show that NSP has little impact on improving downstream task performance. Sentence Ordering Prediction.

Write Comment

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

Education Education Latest News, Education Education Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Leveraging Natural Supervision for Language Representation Learning and Generation: AbstractIn this study, researchers describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision.
Source: hackernoon - 🏆 532. / 51 Read more »

Leveraging Natural Supervision for Language Representation Learning and Generation: AcknowledgementsIn this study, researchers describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision.
Source: hackernoon - 🏆 532. / 51 Read more »

Research finds improving AI large language models helps better align with human brain activityWith generative artificial intelligence (GenAI) transforming the social interaction landscape in recent years, large language models (LLMs), which use deep-learning algorithms to train GenAI platforms to process language, have been put in the spotlight.
Source: ScienceDaily - 🏆 452. / 53 Read more »

Clark High graduate’s journey includes fleeing war, overcoming health problems, learning third languageHajar Abdulraham remembers being 5-years old when a war forced her to leave her country to seek shelter and medical help in Turkey. In June the 18-year old will graduate from Clark High School.
Source: ksatnews - 🏆 442. / 53 Read more »

Team compares robot-assisted language learning systems and human tutors in English conversation lessonsAdvancements in large language models, robotics, and software such as text-to-speech, have made it possible to develop robots that can understand language, interact physically, and communicate verbally. These breakthroughs have opened up possibilities for robots to be used for educational purposes.
Source: physorg_com - 🏆 388. / 55 Read more »