TEAM WORK

     NLTK is arguably one of the best tools out there for removing stopwords. It is widely used and well supported, so I decided to use it for our project. I realized that while NLTK did its job well, we still ended up with a lot of stopwords that confused our model and ruined the result of our topic modeling. The reason for this was due to the fact that we are working with 2-century-old data. There are different types of words that have changed over time, some words that are proper to a specific country/region, like Scotland for example. There are words such as "otter" or "weel". These words are not recognized by NLTK so could not be removed. So, one of my supervisors, an English professor, who is also the head of the project, had to make a list of all these other stopwords manually and gave them to me so that I could add them to the code.

    It is a prime example of how it is always better to have different people with specific expertise on a specific subject when working on a project. This is what makes working on an individual project is very difficult. I realized this as I am planning to create a text generator in my native language Malagasy. I know the coding and NLP part, but I am very bad at grammar. Thankfully, my mother teaches the language at a high school, so cooperating with her by collecting any grammar data I may need from her will make it more realistic to realize my project.

    We are also starting a data science group on campus and the co-leads are from different departments such as business, biology, and math. we will all contribute to a project where each student will contribute with their expertise in their specific domain.

Comments

Popular posts from this blog

FINAL TASK

FINAL POINT