Text Mining for Bioinformatics

Graduate Course, Instituto de Salud Carlos III (ISCIII), 2017

Designed and delivered an intensive 15-hour module on Text Mining and Natural Language Processing (NLP) within the Master in Bioinformatics and Computational Biology program.

Course Highlights:

  • Large-scale Data Acquisition: Extraction from PubMed and PMC repositories using BioPython, Requests, and the Entrez API (ESearch, EFetch).
  • Data Parsing & Processing: Advanced handling of XML and JSON documents using XPath and structured data mapping.
  • NLP with NLTK: Practical implementation of linguistic processing pipelines, including:
    • Sentence segmentation & Tokenization.
    • N-grams and Inverted Indexing for search optimization.
    • Normalization (Lemmatization/Stemming), POS-Tagging, and Chunking.

Skills: Python, NLP, Bioinformatics, Information Retrieval, API Integration.