NEW KNOWLEDGE
I started using PySpark for the first time while processing our data. At first, I thought it would be similar to Python, so it would be easy since I already know python, but I was wrong, there are a lot of differences and I needed time to learn it. PySpark is not a different language, it is more of a module for the Spark platform so it is based on python. I also found out that CSV files differ from one module to another. I created a CSV of our data using Pandas, and when we ran it on Spark, the file could not open properly, so we had to read the file via Pandas, then convert it from Pandas data frame to SPark data frame.
The fact that PySpark is different from the tool I am used to work with is a good thing for me because I managed to learn it, and now, I have gained a new skill. I always wanted to know at least two ways of completing a task. For example, knowing both TensorFlow and PyTorch, knowing both spark and Hadoop, or knowing both Pandas and PySpark. This is the beginning and the more I use new tools during this internship, the more I will learn.
Comments
Post a Comment