DATA PROCESSING
Today, I manipulated data that I would consider fairly big for the first time. It is the collection of all Project Gutenberg books, which has a total size of 31GB. I am supposed to process the books in order to create metadata of Project Gutenberg, including the full text, title, and author. When I wrote the code, I wrote it fully and debugged it on the go whenever an issue occurs. This is how I usually work but it was not the best method this time. Due to the large amount of data, it takes several hours for a task to complete, so running everything and then stopping midway to debug was time-consuming. Sometimes, my script ran for hours before I realize that something does not work fine, so I had to stop the script, edit the code, then ran it from the beginning again. It took me about a week to complete the task, and I am still running the final version of the code while writing the prompt. It made me realize that with such data, it is much...