Author
Abstract
This paper presents an extractive text summarization method specially designed for Sindhi, a culturally rich but low-resource Indo-Aryan language spoken widely in Pakistan. The study focuses on selecting the most relevant sentences from Sindhi texts, employing Natural Language Processing (NLP) techniques to generate concise summaries.The proposed system incorporates essential preprocessing steps, including text cleaning, tokenization, and stemming/lemmatization. For future extraction, it utilizes TF-IDF and sentence embeddings. After scoring the sentences, the most significant ones areselected to form the final summary. To evaluate the system's performance in five test paragraphs, several metrics are used, including F1 score, precision, recall, cosine similarity, normalization level distance, and accuracy. The system demonstrates reliable and accurate summarization, and consistency achieving high precision (1.0), strong F1 score (0.89-0.92), a low a low normalized error (0.04), and an overall accuracy of 0.86. Graphic analysis further confirms the model's coherence, semantic retention, and low error rates.By leveraging NLP for information summarization, this study contributes to preserving and promoting the Sindhi language—potential applications including digital accessibility, education, and content curation. Future research aims to enhance contextual understanding by exploring transformer-based models like BERT and extending the approach to abstraction summarization.
Suggested Citation
Aqsa Memon, Zainab Memon,Akhtar Hussain Jalbani, 2025.
"Extractive Text Summarization-Based Framework for Sindhi Language,"
International Journal of Innovations in Science & Technology, 50sea, vol. 7(6), pages 147-155, May.
Handle:
RePEc:abq:ijist1:v:7:y:2025:i:6:p:147-155
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:abq:ijist1:v:7:y:2025:i:6:p:147-155. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Iqra Nazeer (email available below). General contact details of provider: .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.