Arman Hajisafi

DATA ANALYST

Currently work at UNSW (University Of New South Wales)
Turning complex datasets into meaningful insights.
Experienced in migrating data workflows from local stacks to cloud-based solutions.

About Me

Chapter 1 "Back in Time"

After completing university and earning my Computer Engineering degree, I began my career as a software developer in the tech industry. Over time, I explored various programming languages, took on diverse roles, and even stepped into the responsibilities of a Scrum Master. However, it was during this journey that I discovered my passion for working with data. Python became my tool of choice, as its versatility and simplicity allowed me to dive deeper into data analytics and automation. This shift not only shaped my technical expertise but also sparked a career path focused on solving complex data challenges.

Chapter 2 "Today"

As a Senior Data Analyst, I am part of the PVCESE Insights Team at the University of New South Wales (UNSW), where I specialize in designing and optimizing ETL processes to manage and analyze large-scale datasets from diverse sources, including the comprehensive QILT SES datasets. These government-endorsed surveys span the entire student lifecycle, from commencement to employment, providing valuable insights into student experiences and outcomes.
A significant portion of my work involves Azure Databricks, where I design and implement live tables to ingest, transform, and optimize data for seamless integration into Power BI dashboards. Using PySpark, I build scalable pipelines that enhance data workflows and ensure accurate, up-to-date insights are available for reporting and analysis.
Working in both local and cloud-based environments has further enhanced my expertise in Python and R, while strengthening my skills in Azure services. Visualization remains a key part of my work, where I rely on Power BI to translate complex datasets into clear, actionable dashboards and reports. By combining my technical skills with my deep understanding of UNSW’s data landscape as both a Senior Data Analyst and a current Master’s student, I contribute to delivering impactful insights that support data-driven decision-making across the university.

Apache Iceberg Vs Delta Lake

I’ve been following the feature sets of Apache Iceberg and Delta Lake for a while, enjoying their competition within the data community.

While I haven’t worked directly with Iceberg, the recent acquisition by Databricks has made me wonder about their future, perhaps a merger is on the horizon.

If you’re curious to explore their differences in features, this easy-to-read article breaks it all down for you:

second service

Polars VS PySpark!

If you’re working in a large or mid-sized company, chances are you handle data on the cloud and rely on PySpark (assuming Python is your language of choice for Spark). However, if you’re in a smaller company, freelancing, or have the flexibility to choose between cloud and local environments, you might wonder which tool serves you best: Pandas, Polars, or PySpark.

Personally, I’m a big fan of Polars for its performance and efficiency. However, to ensure maintainability for the next person who works on my code, I usually stick with Pandas. That said, in today’s Spark era, PySpark has become my go-to, given its integration with cloud platforms and ability to handle large-scale data processing.