Singh, ShrutiShrutiSinghSingh, MayankMayankSingh2025-08-312025-08-312024-01-01[9798891761513]2-s2.0-85204921965https://d8.irins.org/handle/IITG2025/29139Research papers are long documents that contain information about various aspects such as background, prior work, methodology, and results. Existing works on scientific document representation learning only leverage the title and abstract of the paper. We present COSAEMB, a model that learns representations from the full-text of 97402 scientific papers from the S2ORC dataset. We present a novel supervised contrastive training framework for long documents using triplet loss and margin gradation. Our framework can be used to learn representations of long documents with any existing encoder-only transformer model without retraining it from scratch. COSAEMB shows improved performance on information retrieval from the paper’s full-text in comparison to models trained only on paper titles and abstracts. We also evaluate COSAEMB on SCIREPEVAL and CSFCube benchmarks, showing comparable performance with existing state-of-the-art models.falseCOSAEMB: Contrastive Section-aware Aspect Embeddings for Scientific ArticlesConference Paper283-29220240cpConference Proceeding