Learning Discourse-Aware Sentence Representations from Document Structures

  • 📰 hackernoon
  • ⏱ Reading Time:
  • 78 sec. here
  • 3 min. at publisher
  • 📊 Quality Score:
  • News: 35%
  • Publisher: 51%

Education Education Headlines News

Education Education Latest News,Education Education Headlines

In this study, researchers exploit rich, naturally-occurring structures on Wikipedia for various NLP tasks.

Author: Mingda Chen. Table of Links Abstract Acknowledgements 1 INTRODUCTION 1.1 Overview 1.2 Contributions 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.

, we use the averaged vector at the position of the token across all layers. We also evaluate per-layer performance for both models in Section 4.2.5. When reporting results for SentEval, we compute the averaged Pearson correlations for Semantic Textual Similarity tasks from 2012 to 2016 . We refer to the average as unsupervised semantic similarity since those tasks do not require training data.

-base, we use the vector from the position of the token. Fig. 4.4 shows the heatmap of performance for individual hidden layers. We note that for better visualization, colors in each column are standardized. On SentEval, -large by a large margin, we observe that within some specific domains, for example Wiki in BSO,

and Skip-thought vectors, which have training losses explicitly related to surrounding sentences, perform much stronger compared to their respective prior work, demonstrating the effectiveness of incorporating losses that make use of broader context. 4.2.2 Related Work Discourse modelling and discourse parsing have a rich history , much of it based on recovering linguistic annotations of discourse structure.

to ELMo and Skip-thought to InferSent on DiscoEval, we can see the benefit of adding information about neighboring sentences. Our proposed training objectives show complementary improvements over NSP, which suggests that they can potentially benefit these pretrained representations. 4.2.5 Analysis Per-Layer analysis . To investigate the performance of individual hidden layers, we evaluate ELMo and

 

Thank you for your comment. Your comment will be published after being reviewed.
Please try again later.
We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

 /  🏆 532. in EDUCATÄ°ON

Education Education Latest News, Education Education Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Improving Language Representation Learning via Sentence Ordering PredictionPrior work has found that the next sentence prediction loss used for pretraining is ineffective in improving downstream task performance.
Source: hackernoon - 🏆 532. / 51 Read more »

How to disable learning on the Nest Learning ThermostatYou'll want to disable learning on the Nest Learning Thermostat if you want to run your own heating and cooling schedule. Here's how it works.
Source: DigitalTrends - 🏆 95. / 65 Read more »

Gene Linked to Learning Difficulties Has Direct Impact on Learning and MemoryScience, Space and Technology News 2024
Source: SciTechDaily1 - 🏆 84. / 68 Read more »

Machine Learning vs. Deep Learning: What's the Difference?Artificial intelligence technology is undergirded by two intertwined forms of automation.
Source: Gizmodo - 🏆 556. / 51 Read more »