Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm Articles uri icon

publication date

  • June 2022

start page

  • 1

end page

  • 23

issue

  • 13

volume

  • 12

International Standard Serial Number (ISSN)

  • 2076-3417

abstract

  • In the present scenario, Automatic Text Summarization (ATS) is in great demand to ad‑
    dress the ever‑growing volume of text data available online to discover relevant information faster.
    In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic
    Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology com‑
    prises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary
    generation. Rigorous experimentation on varied feature sets is performed where distinguishing fea‑
    tures, namely‑ sentence similarity and named entity features are combined with others for computing
    the evaluation metrics. The top 14 feature combinations are evaluated through Recall‑Oriented Un‑
    derstudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights
    through strings of features, chromosomes selection, and reproduction operators: Simulating Binary
    Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus sum‑
    mary, different compression rates are tested. In comparison with existing summarization tools, the
    ATS extractive method gives a summary reduction of 65%.

subjects

  • Computer Science

keywords

  • automatic text summarization; extractive summary; feature set; hindi language; hindi; health data; named entity; real coded genetic algorithm; rouge metric; summarization tool