ASSEMBLING THE PIECES


Looking for a way (or ways) to tackle a new issue we are facing

    As we progress in collecting data, new issues emerge that need to be tackled. So far, I have collected all the English books from the Project Gutenberg database. This separation of the English books is a major success as we are now working with only 17 thousand books instead of 60 thousand, which is the total amount of books on the database. However, we do not need all the books in English. We only need the ones that were released in the U.K. and released within a specific time period. The biggest challenge is the Project Gutenberg database does provide both of these information, but the release date is not always accurate. So, after a discussion with the team, we decided to use other resources to gather the country of origin and the release date. We decided to go with Wikidata due to the ease of querying data from it using SPARQL; I spent the morning of Friday, May 27th, 2022, learning the new query language before starting to query data. 

    After hours of work, I found out that there are only around 4 thousand books in English released within our timeframe recorded on the Wikidata. This result allowed me to conclude that not all 17 thousand books have been recorded, and I have to find a new way to get all the information I need accurately. I need to come up with something before making a report next week. After taking a break to think of the possible ways to tackle my issue, I came up with a possible solution: collecting data from different sources and assembling them like pieces of a puzzle. So far, I have three sources: Project Gutenberg, Wikidata, and WorldCat. I will collect the country of origin from Project Gutenberg and then filter the data further. With the new, fewer data, I will collect the release years from the same Project Gutenberg, Wikidata, and WorldCat, compare them, check if any book has multiple volumes, and take the release date of the first volume instead. 

    It will be an extremely big amount of work that needed be completed in a short period of time, but I am glad that I finally came up with a plan and know where to start.

 

Comments

Popular posts from this blog

FINAL TASK

FINAL POINT