Text Mining for Bioinformatics
Graduate Course, Instituto de Salud Carlos III (ISCIII), 2017
Designed and delivered an intensive 15-hour module on Text Mining and Natural Language Processing (NLP) within the Master in Bioinformatics and Computational Biology program.
Course Highlights:
- Large-scale Data Acquisition: Extraction from PubMed and PMC repositories using
BioPython,Requests, and theEntrez API(ESearch, EFetch). - Data Parsing & Processing: Advanced handling of XML and JSON documents using
XPathand structured data mapping. - NLP with NLTK: Practical implementation of linguistic processing pipelines, including:
- Sentence segmentation & Tokenization.
- N-grams and Inverted Indexing for search optimization.
- Normalization (Lemmatization/Stemming), POS-Tagging, and Chunking.
Skills: Python, NLP, Bioinformatics, Information Retrieval, API Integration.
