Skip to main content

Data Clean Up and Analysis

Tools used: OpenRefine, Python, Python Libraries (Pandas and Matplotlib)

I decided to refer back to my data science class from freshman year and highschool and brush off in using python script and python libraries to clean, reorganize and render the data


Initial Goal: To find some trend on the frequency of heat.

Steps taken:

  • Uploaded the raw CSV file into OpenRefine to delete columns that restart the process and to appropriately split and align columns 
  • Used python script to read csv file
  • Deleted the rows of new process data 
  • Converted the heat binary to float to see the mean and the percentage of how often heat is turned on 
  • Got a percentage of the amount of time heater was on (~40%) minority of the time
  • Exported the cleaned csv data
  • Tried to plot the trends of heat with other variables (this is still a work in progress)
Reflection: 

I feel good about the output. I think with more studying and time, I could focus on a wider array of insights and provide a more accurate analysis. Unfortunately, when dealing with data this large, data processing tools are fundamental, and I’m afraid that being rusty might have affected the quality of the analysis. I was reminded of how interesting and frustrating data analysis is, but it was a great opportunity to sit down and write code for data after such a long time.

The biggest struggle I had was the amount of data, and the cleaning process took a long time. I couldn’t do it through Python alone, so I had to get my hands dirty and edit using OpenRefine. One piece of feedback I have is that if we had totals and averages of the variables, it would have been easier to catch trends. If the data collector had an automated process that added up sums and averages of the collected data, it would have helped immensely.

(To be Edited: I will embed a source code of my process once I figure out the problems with my github account)

Comments

Popular posts from this blog

Applied Digital Humanities: Project Proposal

Project Focus This project examines Enterprise Resource Planning (ERP) systems as socio-technical infrastructures that reorganize knowledge, labor, and authority within large organizations. Rather than treating ERP as merely business software, this study frames it as a digital infrastructure that reshapes how disciplines communicate, how workflows are structured, and how institutional knowledge is produced and controlled. Using the ongoing ERP implementation at Avista Corporation as a case study, this project analyzes whether ERP systems meaningfully integrate organizational functions or simply reorganize and redistribute existing silos under a centralized technological framework. What is an ERP (Enterprise Resource Planning System):  Enterprise resource planning (ERP) refers to a type of software that organizations use to manage day-to-day business activities such as accounting, procurement, project management, risk management and compliance, and supply chain operations.  ...

Applied Digital Humanities: Project Proposal Draft

Project Focus: Pros and Cons of Enterprise Resource Planning as an Interdisciplinary Infrastructure What is an ERP (Enterprise Resource Planning System):  Enterprise resource planning (ERP) refers to a type of software that organizations use to manage day-to-day business activities such as accounting, procurement, project management, risk management and compliance, and supply chain operations.  ERP systems tie together a multitude of business processes and enable the flow of data between them. By collecting an organization’s shared transactional data from multiple sources, ERP systems eliminate data duplication and provide data integrity with a single source of truth. - Oracle: What is ERP? (https://www.oracle.com/erp/what-is-erp/) Project Objective and Motivation:  The objective of this project is to critically examine enterprise resource planning (ERP) systems as interdisciplinary socio-technical infrastructures that shape how labor moves across an organization. I will ...

Digital Humanities: Week 2 Reflection

AI and Art   Dead men tell no tales, but they do sing songs after 43 years of their passing, or so I thought when I received a notification on November 2, 2023, from Spotify announcing that The Beatles had released a new song. The Beatles officially broke up after 1974, and two of the four members passed away in the years that followed. Yes, there were songs and recordings that came out well into the '90s and early 2000s, but it was still a surprise to the world to get a new Beatles song in 2023. Was it a prank? A fluke? If not, then what sorcery was this? And that is when Artificial Intelligence and 'Lord of the Rings' director Peter Jackson come into the picture. In a nutshell, Peter Jackson, with the help of machine learning and AI, was able to retrieve and isolate John Lennon’s voice from an old cassette. They then proceeded to retrieve George’s old guitar riffs in the vault and recorded McCartney and Ringo in the studio, successfully creating a song with two living Bea...