Posts

Showing posts from May, 2022

APPLYING DIFFERENT THOUGHTS

Image
  Testing out a way to filter the book by year and country of release      I have been having difficulty figuring out the best way to filter out the books released in the UK from a specific time frame. The metadata from Project Gutenberg does not provide either the country of origin or release date. As described in my last blog post, I have to collect this information in different ways and combine them all. While looking at the metadata again during the weekend, I realized that it provides most of the authors' years of birth and death. Of the +17,000 books in English on the Project Gutenberg, around 12,000 books come along with the year of birth and death of their authors. While thinking of the way of filtering the books, I remember one key information one of my supervisors told me: if I cannot find any information about a particular book, any information about its author may be helpful. So I realized that even if I do not know when the book was released, I am certai...

ASSEMBLING THE PIECES

Image
Looking for a way (or ways) to tackle a new issue we are facing      As we progress in collecting data, new issues emerge that need to be tackled. So far, I have collected all the English books from the Project Gutenberg database. This separation of the English books is a major success as we are now working with only 17 thousand books instead of 60 thousand, which is the total amount of books on the database. However, we do not need all the books in English. We only need the ones that were released in the U.K. and released within a specific time period. The biggest challenge is the Project Gutenberg database does provide both of these information, but the release date is not always accurate. So, after a discussion with the team, we decided to use other resources to gather the country of origin and the release date. We decided to go with Wikidata due to the ease of querying data from it using SPARQL; I spent the morning of Friday, May 27th, 2022, learning the new query lan...

THE PROCESS

Image
A new way of working on a project     So far, this internship has made me realize how different actual research is from a personal project. Most of the time, I would do a personal project to practice a concept I already know and try to find a project idea based on my skill level. However, with this project, we start with the end goal and then think of the way to reach it. For the past three days, I have been reading and studying the result of research by a team from Cornell University as part of the process of finding the best method to collect our data. This part is mainly my task, and I a m supposed to compare different methods of collecting the data in order to find the most efficient one.     This new method of working is very new to me. For once, I had to forget about everything I already knew, learn how to learn from other people's research, and use as many of the resources available as possible. This process was very stressful for me on the first day...
Image
THE BEGINNING Kicking off my research internship with the team at Vanderbilt University    The official start date of my internship is today, May the 23rd, 2022, but the internship actually started a few days ago when I, along with Dr. Nakazawa, visitedt our colleagues over at Vanderbilt University on May 18th and 19th, 2022. So this very first blog will focus on that trip and my experience.    The visit was for two primary purposes: to meet the team with whom I will be working over the summer and discuss the project's different steps and my main tasks. However, we also had time to tour tour the campus and such as the Data Science Institute, the Makerspace, and the Retro Computer Exhibit. We had time for lunch and dinner with each of the two project leads. We went to get breakfast at this restaurant with a retro look, where I saw a jukebox for the first time. But despite the cool places we visited, the most valuable experience for me was the time spent with each of t...