Summer Internship 2020
Condor Agency provided me the opportunity to gain more experience in the Data Science field and prepared me for my future career. I was part of the Rodilo project in this agency.
The goal of Rodilo is to provide up to date information about COVID-19 by focusing in Latin America. In this project we tried to provide all the information about COVID-19 cases, death, recover, lockdown, mobility, and so on. These information are based on both regional and country scales. We tried to provide the latest information possible. We used different sources for collecting data, but mostly they are gathered by Bing.
Below, I summarised the tasks and purposes of them;
1.Research what publicly available data can be leveraged for the purpose of each project:
Researching was the most important part since we needed to find the up-to-date and complete data sources and for our project. By that I learn how to gather relative information to be able to analyze different ways and find the best method of getting the tasks done. For instance, I searched on google for a better resource for updating the population data of different countries which helped a lot with getting more accurate results of the impact of COVID-19 in different areas.
2.Extract, transform and merge data from different data sources:
This is one of the important parts of Data Science. We have to be able to extract the data from different websites and transform it to a table that we can use later and merge it with the current table in the database. For instance, I did extract population data from Wikipedia and WorldMeter for getting total population, cases and death in countries around the globe then, transformed it to a data frame and then merged it to the previous data frame that we had in the database. It helped me a lot to understand the concept of joining tables together, how to extract data from any website and transform it to a data frame. I used both my Python and R skill for this part which was a very good exercise.
3.Understanding the relationship between metrics and dimensions to establish principles and independent variables for predictive models:
This part is a fundamental part of the Data Science career. Knowing specific words that will be used in the data science job is very important and with different tasks that the company assigned to me, helped me to understand them well. Tasks such as calculating sum of the metrics and assigning values for each dimension based on one or two metrics. Then, I learned different methods and tools for prediction which is one of the essential tasks of any Data Scientist! I prepared prediction plots for cases of different regions and countries.
4.Find and execute opportunities to leverage programming languages like Python and R to automate processes that have been done manually until now:
This is the part that I learned how to create a master script. With that instead of running each script separately, I made one single script that called all that script at one time which was very helpful. I also made a script for automatically connecting to the database (PostgreSQL) for purposes of getting the data or uploading the data into the database. Furthermore, I made a connection to google sheet to be able to upload new tables in there automatically after updating data in the database.