Repository logo
  • English
  • العربية
  • বাংলা
  • Català
  • Čeština
  • Deutsch
  • Ελληνικά
  • Español
  • Suomi
  • Français
  • Gàidhlig
  • हिंदी
  • Magyar
  • Italiano
  • Қазақ
  • Latviešu
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Српски
  • Svenska
  • Türkçe
  • Yкраї́нська
  • Tiếng Việt
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Scholalry Output
  3. Publications
  4. COSAEMB: Contrastive Section-aware Aspect Embeddings for Scientific Articles
 
  • Details

COSAEMB: Contrastive Section-aware Aspect Embeddings for Scientific Articles

Source
Sdp 2024 4th Workshop on Scholarly Document Processing Proceedings of the Workshop
Date Issued
2024-01-01
Author(s)
Singh, Shruti
Singh, Mayank  
Abstract
Research papers are long documents that contain information about various aspects such as background, prior work, methodology, and results. Existing works on scientific document representation learning only leverage the title and abstract of the paper. We present COSAEMB, a model that learns representations from the full-text of 97402 scientific papers from the S2ORC dataset. We present a novel supervised contrastive training framework for long documents using triplet loss and margin gradation. Our framework can be used to learn representations of long documents with any existing encoder-only transformer model without retraining it from scratch. COSAEMB shows improved performance on information retrieval from the paper’s full-text in comparison to models trained only on paper titles and abstracts. We also evaluate COSAEMB on SCIREPEVAL and CSFCube benchmarks, showing comparable performance with existing state-of-the-art models.
URI
https://d8.irins.org/handle/IITG2025/29139
IITGN Knowledge Repository Developed and Managed by Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify