How to merge cells in Excel using Python pandas?

We have been writing articles on Python (that can develop websites for you!), especially with respect to excel and PDFs. You can read this article, on, How to generate a Graph for Excel data using Python Pandas?

The other article that adds a border to an excel sheet is here.

This article is about merging two cells using pandas our favorite package in python that does many jobs. So, let us quickly figure this out using python and pandas.

As always, we will be importing pandas at the beginning of our program.

If you want to know more about installing python and the beginner tutorials, you can read the beginning of this article.

import pandas as pan

Once the pandas package is imported as “pan”, we will be getting into the data frames.

A data frame is a two-dimensional array whose data can be appended or deleted from it at any point in time. This comes in handy especially when there is a structure like excel where we have to export and import data.

We will be creating this kind of a data frame and assign a two-dimensional array as follows:

dafr = pan.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

We can understand from the above code that we have assigned the name of the car brand along with the model in the two-dimensional array.

Now, we have to start appending the array into another array that we will declare this way:

startCells = [1]

Once another array is ready, we will start appending the data to this array. The data we append will be only the name of the car brands that are appearing in the array for the first time.

The code for this goes as follows:

for row in range(2,len(dafr)+1):
    if (dafr.loc[row-1,'Name'] != dafr.loc[row-2,'Name']):
        startCells.append(row)

In the above code, you need to understand a few functions that make programming very convenient as you will see further in the article.

The first function is range. The range is a built-in python function that takes into account the data that is specified within the parenthesis that form the parameters of the function.

In our code, the range starts from the second position of the array and extends till one is added to the length of the data frame.

To select a particular item from the range, we use ‘loc‘ function of the python package, “pandas“.

So, in the above code, we are trying to remove the duplicates from the names and store only the names that are unique. In our case, we will be considering only the Tesla, Toyota, and Ford, removing their duplicates.

writer = pan.ExcelWriter('mergecells.xlsx', engine='xlsxwriter')

In the code above, an excel writer is created. This is where the engine ‘xlswriter’ is made use of to create an excel ‘xlsx’.

dafr.to_excel(writer, sheet_name='mergesheet', index=False)

This simple line of code is quite obvious. It is just populating the data that is in the data frame into excel. In addition to that, a sheet name called ‘sheet1′ is created.

The index parameter writes row names of ‘true’, in our case is false.

workbook = writer.book
worksheet = writer.sheets['mergesheet']

The above code creates both an excel file and the sheet

merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})

To add merge format to the excel that we just created we need to use the code above. You might have already thought about the number of functions that python and the packages built for python contain.

Note that the alignments of the cell are the parameters of the function ‘add_format’.

Assign the last row of the data frame with the length of the data frame for further processing as follows.

lastRow = len(dafr)

The next step has to be explained completely with all the remaining code, but for the last line of code.

The following is the code that we are talking about:

We start from the first line by starting the iteration. If the row is the row end then we stop the process of merging and start to write the data frame to the excel.

If the row is not the end row, we merge the cells making it one.

There is an exception to this rule when the iteration comes to the last row. In such situations, the data frame is written to the excel file for the last time. Even the last but one will be merged.

for row in startCells:
    try:
        endRow = startCells[startCells.index(row)+1]-1
        if row == endRow:
            worksheet.write(row, 0, dafr.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, endRow, 0, dafr.loc[row-1,'Name'], merge_format)
    except IndexError:
        if row == lastRow:
            worksheet.write(row, 0, dafr.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, lastRow, 0, dafr.loc[row-1,'Name'], merge_format)

The last line is to save the excel that we have written and which is readily sorted. This is achieved by the following code.

writer.save()

You can, instead, execute the below code and get the following results.

import pandas as pan

dafr = pan.DataFrame({'Name': ['Tesla','Tesla','Toyota','Ford','Ford','Ford'],
                   'Type': ['Model X','Model Y','Corolla','Bronco','Fiesta','Mustang']})

startCells = [1]

for row in range(2,len(dafr)+1):
    if (dafr.loc[row-1,'Name'] != dafr.loc[row-2,'Name']):
        startCells.append(row)

writer = pan.ExcelWriter('mergecells.xlsx', engine='xlsxwriter')
dafr.to_excel(writer, sheet_name='mergesheet', index=False)
workbook = writer.book
worksheet = writer.sheets['mergesheet']
merge_format = workbook.add_format({'align': 'center', 'valign': 'vcenter', 'border': 2})

lastRow = len(dafr)

for row in startCells:
    try:
        endRow = startCells[startCells.index(row)+1]-1
        if row == endRow:
            worksheet.write(row, 0, dafr.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, endRow, 0, dafr.loc[row-1,'Name'], merge_format)
    except IndexError:
        if row == lastRow:
            worksheet.write(row, 0, dafr.loc[row-1,'Name'], merge_format)
        else:
            worksheet.merge_range(row, 0, lastRow, 0, dafr.loc[row-1,'Name'], merge_format)


writer.save()