Skip to main content

Data Clean Up and Analysis

Tools used: OpenRefine, Python, Python Libraries (Pandas and Matplotlib)

I decided to refer back to my data science class from freshman year and highschool and brush off in using python script and python libraries to clean, reorganize and render the data


Initial Goal: To find some trend on the frequency of heat.

Steps taken:

  • Uploaded the raw CSV file into OpenRefine to delete columns that restart the process and to appropriately split and align columns 
  • Used python script to read csv file
  • Deleted the rows of new process data 
  • Converted the heat binary to float to see the mean and the percentage of how often heat is turned on 
  • Got a percentage of the amount of time heater was on (~40%) minority of the time
  • Exported the cleaned csv data
  • Tried to plot the trends of heat with other variables (this is still a work in progress)
Reflection: 

I feel good about the output. I think with more studying and time, I could focus on a wider array of insights and provide a more accurate analysis. Unfortunately, when dealing with data this large, data processing tools are fundamental, and I’m afraid that being rusty might have affected the quality of the analysis. I was reminded of how interesting and frustrating data analysis is, but it was a great opportunity to sit down and write code for data after such a long time.

The biggest struggle I had was the amount of data, and the cleaning process took a long time. I couldn’t do it through Python alone, so I had to get my hands dirty and edit using OpenRefine. One piece of feedback I have is that if we had totals and averages of the variables, it would have been easier to catch trends. If the data collector had an automated process that added up sums and averages of the collected data, it would have helped immensely.

(To be Edited: I will embed a source code of my process once I figure out the problems with my github account)

Comments

Popular posts from this blog

Applied Digital Humanities: Project Proposal

Project Focus This project examines Enterprise Resource Planning (ERP) systems as socio-technical infrastructures that reorganize knowledge, labor, and authority within large organizations. Rather than treating ERP as merely business software, this study frames it as a digital infrastructure that reshapes how disciplines communicate, how workflows are structured, and how institutional knowledge is produced and controlled. Using the ongoing ERP implementation at Avista Corporation as a case study, this project analyzes whether ERP systems meaningfully integrate organizational functions or simply reorganize and redistribute existing silos under a centralized technological framework. What is an ERP (Enterprise Resource Planning System):  Enterprise resource planning (ERP) refers to a type of software that organizations use to manage day-to-day business activities such as accounting, procurement, project management, risk management and compliance, and supply chain operations.  ...

Applied Digital Humanities: Project Proposal Draft

Project Focus: Pros and Cons of Enterprise Resource Planning as an Interdisciplinary Infrastructure What is an ERP (Enterprise Resource Planning System):  Enterprise resource planning (ERP) refers to a type of software that organizations use to manage day-to-day business activities such as accounting, procurement, project management, risk management and compliance, and supply chain operations.  ERP systems tie together a multitude of business processes and enable the flow of data between them. By collecting an organization’s shared transactional data from multiple sources, ERP systems eliminate data duplication and provide data integrity with a single source of truth. - Oracle: What is ERP? (https://www.oracle.com/erp/what-is-erp/) Project Objective and Motivation:  The objective of this project is to critically examine enterprise resource planning (ERP) systems as interdisciplinary socio-technical infrastructures that shape how labor moves across an organization. I will ...

PH2 - Interview with ERP System Fusion Manager - Mike Beil

Mike Beil is a Senior Lead of ERP Fusion. His job focuses on ensuring that current systems integrate well with the upcoming SAP platform. He mainly works with data migration, validation, and ensuring the process of fusion aligns with the strategic goals that leadership aims to achieve with this implementation. Mike has served as a system engineer and as a cybersecurity expert prior to this position and has a deep understanding of how the vertically integrated utility works with its business functions. Mike and I sat down to talk about the technical considerations that Avista has to make when integrating into this large ERP system. My questions, unlike others, focused on understanding the backbone of our ERP systems and what is being done to ensure we mitigate some identified risks that come with it. During our conversation Mike spoke about the broader operational goals behind the ERP implementation. From his perspective, much of the motivation behind the project comes from the recogn...