Singh, MayankLodwal, HiteshHiteshLodwal2025-09-042025-09-042024-01-01https://d8.irins.org/handle/IITG2025/32011hbk.; 30 cmWeb ScrapingDeduplication-SimHashTokenizer-SentencePiece Byte Pair EncodingData curation for Indic languageM.Techxi, 39p.M.Tech123456789/440