Skip to main content

(This is Part II of a two-part blog series. If you missed Pari I, you can find it here). 

So, how does a career-long data nerd decide when it’s preferable to turn to Excel or Python for a given analytics project? Let’s break it down together.

My Rules of Thumb for Choosing Excel vs. Python

My informal, intuitive rules of thumb for when I use Excel vs. Python are as follows.

Excel

Ideal for:

  1. Getting a visual feel for raw data at the beginning of the project
  2. Making quick calculations
  3. Building simple charts and graphs
  4. Doing data aggregation using Pivot tables

Python

Ideal for:

  1. Building repeatable processes that I’ll need to run multiple times
  2. Working with very large data sets of more than 1 million rows
  3. Building machine learning models
  4. Creating complex visualizations

Let’s dive a bit deeper into the specific task types to demonstrate how I differentiate my use of one tool or the other.

Task Types I Typically Carry Out in Excel

Exploring smaller data sets: Excel lets me handle smaller datasets quickly, allowing for quick manipulations and visualizations without coding. “Small” is a relative term: with over 1MM rows and over 16K columns, it offers plenty of capacity for most projects that could come your way.

Easier to use for data beginners: With lots of built-in help features and a user interface that’s easy to explore to find new point-and-click operations, Excel is very easy to get started with even if you have no spreadsheet experience

Performing quick calculations: There is often no good reason to take the time to import a CSV file into Python if I just need to do a quick calculation such as summing a single column or counting the number of nonblank rows in another. Popping open Excel couldn’t be easier for such simple tasks.

Splitting text out into multiple columns with text-to-columns: When I need to manipulate my data by splitting text from a single column into multiple columns quickly, Excel’s text-to-columns function makes this a breeze

Aggregating data with Pivot tables: Pivot tables in Excel are the one of the coolest data tools you’ll ever use. They make it easy to summarize data sets by counts, averages, sums and more. Need to know average transaction value of Widget ABC last month? Drag your transaction source file into a Pivot table and you’ll have the answer in 15 seconds. Note: Python, too, has a Pivot table feature. But with Excel’s drag-and-drop feature, it’s just way easier to do this in Excel than in Pandas unless you are doing it as part of a larger set of operations inside Python.

Sharing data files with other people: Excel’s widespread use makes it ideal for sharing raw data and finished analyses with colleagues and clients. (Granted, it’s not as easy to share Excel files as it is to share Google Sheets files, but that’s a topic for another post). Almost everyone has Excel installed on their computer or can easily get it, so it’s far easier to share Excel files with others than trying to share a Jupyter Notebook or raw Python script. This is especially true if you are sharing raw data sets.

Pulling in specific data from a different worksheet with VLOOKUP: The versatile VLOOKUP function helps in matching and retrieving data from one worksheet to another based on the values in a given column. For example: you need to add an email address column for each customer to a spreadsheet and all you have is customer ID, you can use VLOOKUP to pull the email data in from another table that links email to customer ID.

Identifying “dirty data” patterns in raw data set quickly: Excel is extremely visual and hands-on by design. It allows you to quickly filter, sort and explore a set of raw data before you tackle it with Excel (or Python) for cleansing.

Carrying out simple data visualizations: Creating basic charts and graphs in Excel is intuitive and requires minimal time, compared to Python

Finding an input value that will yield a desired result in another cell with goal seek: Sometimes you need to figure out what value you would need to put into cell A in order to make cell B show a given target value by adjusting the value in cell C. Excel’s built-in goal seek feature is a great way to do this.


Task Types I Typically Carry Out in Python

Working with larger data sets: Python handles massive datasets more efficiently than Excel, with less risk of crashes or slowdowns. It is really only limited by your computer’s RAM memory

Building and carrying out multi-step, repeatable scripts: Python really hits its stride when you need to more complex, multi-step data manipulation processes that can be easily “run” with the touch of a button, once the script it written. Excel has something akin to this called VBA macros, but in practice they don’t work the same way.

When there is an emphasis on data quality: Python enforces a particular data type in a given column, unlike Excel. This means that you must declare (or assign) a data type like text, floats, or dates to a column. While this feels like an additional constraint, it’s actually extremely useful when it comes to accuracy, finding incorrect data types in a column, and maximizing computing efficiency.

Handling large sets of data: Python’s available libraries – like Pandas – efficiently handle and describe large datasets that might overwhelm Excel. As long as your computer’s processor is up the task, Pandas can crunch the data. (If you are working with a super-large data set, try the successor library to Pandas called Polars. Polars allows you to keep data both in memory AND on disk, allowing for large data sets despite RAM limitations).

Ingesting data from external sources via an API key: Python’s versatility shines when importing data from various sources like cloud-based databases via an API or when scraping data from a website. While Excel allows data import from a variety of data sources and in various formats, many cloud-based resources require you to use an API key or scrape date. This is where Python shines.

Building machine learning models: Python’s extensive machine learning library options like Scikit-learn, PyTorch and TensorFlow make it ideal for building and implementing ML models. Machine learning involves a complex training phase whereby a process is repeated hundreds or thousands of times to find the best-fitting model. A script-based environment like Python is well-suited for this.

Performing complex data visualizations: Libraries like Matplotlib and Seaborn enable creation of sophisticated, customizable data visualizations

Running data projects in the cloud: Python is a nearly universally-accepted language by most major cloud provides. Amazon Web Services, Microsoft Azure and Google Cloud Platform all allow you to use Python to run commands and manage projects

Versatility beyond data manipulation and analysis: Excel is purpose-built for handling text and numerical data, as well as performing visualizations. While Python Pandas, too, is about data, Python itself is a multipurpose language. So, in the same program or script, you can use Python to not only ingest, clean and visualize data – but also run a chatbot, push information out to a website, build web applications and do dozens of other things. Excel just can’t match that.

Tasks That Are a Toss-Up

There are a few types of tasks that are truly a toss-up as to whether I reach for Excel or Python to carry it out. Here they are:

Data cleansing: Both Excel and Python offer unique strengths for cleaning, combining and preparing datasets, depending on the dataset size and complexity

Exploratory analysis: Each tool takes different approaches to initial data exploration, suiting various analysis needs. Excel is great for getting a quick visual “feel” for the data. It’s scroll feature makes that a breeze. However, Python Pandas has built-in features like pandas.describe() and pandas.shape() that allow you to instantaneously build a statistical summary of a given DataFrame, calculating minimum values, maximum values, counts, averages, sums and more.

Building formulas based on data in other columns: When I need to create a new column and fill it with data that is derived from one or more other columns, I find myself going to Excel and Pandas fairly equally for this task.

Having trouble choosing? If you love Excel but want to dabble in Python, consider this relatively new project, Python in Excel that lets you run mini Python scripts inside Excel. I haven’t played around with this yet but it looks fascinating!

***

JLytics’ mission is to empower CEOs, founders and business executives to leverage the power of data in their everyday lives so that they can focus on what they do best: lead.
Start the Conversation

Interested in exploring a relationship with a data partner dedicated to supporting executive decision-making? Start the conversation today with JLytics.