Automating Excel Report Generation

Generating a formatted Excel report by hand once is manageable. Generating one every Monday, for every region, with conditional highlights, charts, and a summary row, is not. Python automates the entire sequence: pull data, aggregate, write styled cells, embed charts, and save — with no manual post-processing.

This guide covers the full workflow: library selection, pandas ExcelWriter integration, openpyxl and xlsxwriter styling, multiple sheets, summary rows, conditional formatting, and embedded charts. The techniques here complement the broader Python for Excel & CSV Data Processing guide.

Prerequisites

Python 3.9+, a virtual environment, and the following packages:

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install pandas openpyxl xlsxwriter

requirements.txt for a locked environment:

pandas==2.2.2
openpyxl==3.1.2
xlsxwriter==3.2.0

Create a sample CSV to follow along:

mkdir -p data output
python - <<'EOF'
import pandas as pd, numpy as np
rng = np.random.default_rng(42)
regions = ["North", "South", "East", "West"]
products = ["Widget A", "Widget B", "Gadget X"]
rows = []
for month in range(1, 7):
    for _ in range(40):
        rows.append({
            "month": f"2026-{month:02d}",
            "region": rng.choice(regions),
            "product": rng.choice(products),
            "units": int(rng.integers(10, 200)),
            "revenue": round(float(rng.uniform(500, 8000)), 2),
        })
pd.DataFrame(rows).to_csv("data/sales.csv", index=False)
print("data/sales.csv written")
EOF

Inspect the Input Before Writing

Always validate the incoming DataFrame before you write anything to Excel. Silent type coercions produce wrong number formats.

# pip install pandas
from pathlib import Path
import pandas as pd

INPUT = Path("data/sales.csv")

try:
    df = pd.read_csv(INPUT)
    print(df.dtypes)         # confirm numeric cols are not object
    print(df.isnull().sum()) # find gaps before aggregation
    print(df.head(3))
except FileNotFoundError:
    raise SystemExit(f"Missing input file: {INPUT}")

Expected: revenue and units as float64/int64. If they show as object, the CSV contains currency symbols or commas — strip them with pd.to_numeric(df["revenue"].str.replace(r"[$,]", "", regex=True), errors="coerce") before proceeding.

Data → Styler → Workbook → Sheets: The Pipeline

The diagram below shows how raw data flows through the four-stage pipeline this guide implements.

Step 1 — Aggregate the Data

# pip install pandas
from pathlib import Path
import pandas as pd

INPUT = Path("data/sales.csv")

try:
    df = pd.read_csv(INPUT, parse_dates=["month"])
except FileNotFoundError:
    raise SystemExit(f"Missing: {INPUT}")

df["revenue"] = pd.to_numeric(df["revenue"], errors="coerce")
df["units"] = pd.to_numeric(df["units"], errors="coerce")
df.dropna(subset=["revenue", "units"], inplace=True)

# Regional summary
by_region = (
    df.groupby("region", as_index=False)
    .agg(total_revenue=("revenue", "sum"),
         total_units=("units", "sum"),
         transactions=("revenue", "count"))
)
by_region["avg_order"] = by_region["total_revenue"] / by_region["transactions"]

# Monthly trend
by_month = (
    df.groupby("month", as_index=False)
    .agg(total_revenue=("revenue", "sum"))
    .sort_values("month")
)

print(by_region)
print(by_month)

Step 2 — Write Multiple Sheets with Styled Headers

pandas.ExcelWriter delegates to xlsxwriter or openpyxl as the engine. Use xlsxwriter when creating a new file with heavy formatting; use openpyxl when you need to modify an existing file (see Reading Excel Files with Python for that path).

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/sales_report.xlsx")
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

# Re-use by_region and by_month from Step 1
try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary", index=False, startrow=1)
        by_month.to_excel(writer, sheet_name="Monthly Trend", index=False, startrow=1)

        wb = writer.book

        # --- define reusable formats ---
        hdr_fmt = wb.add_format({
            "bold": True, "bg_color": "#2563eb", "font_color": "#ffffff",
            "border": 1, "font_size": 12,
        })
        currency_fmt = wb.add_format({"num_format": "#,##0.00", "border": 1})
        int_fmt      = wb.add_format({"num_format": "#,##0",    "border": 1})
        text_fmt     = wb.add_format({"border": 1})

        for sheet_name, frame in [("Summary", by_region), ("Monthly Trend", by_month)]:
            ws = writer.sheets[sheet_name]
            # Write styled header row (row 0, overwriting the blank startrow)
            for col_idx, col_name in enumerate(frame.columns):
                ws.write(1, col_idx, col_name, hdr_fmt)
            ws.set_row(1, 20)

        # Column widths for Summary sheet
        ws_sum = writer.sheets["Summary"]
        ws_sum.set_column("A:A", 14, text_fmt)
        ws_sum.set_column("B:D", 16, currency_fmt)

except PermissionError:
    raise SystemExit(f"Close {OUTPUT} in Excel before running this script")
except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

print(f"Written: {OUTPUT}")

Step 3 — Number Formats and Summary Rows

A summary row at the bottom of the data gives stakeholders a quick total without opening a formula bar.

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/sales_report.xlsx")

try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary", index=False, startrow=1)
        wb = writer.book
        ws = writer.sheets["Summary"]

        currency_fmt = wb.add_format({"num_format": "#,##0.00", "bold": False})
        total_fmt    = wb.add_format({"num_format": "#,##0.00", "bold": True,
                                      "top": 2, "bg_color": "#f1f5f9"})
        label_fmt    = wb.add_format({"bold": True, "bg_color": "#f1f5f9"})

        n_rows = len(by_region)
        # data starts at Excel row 3 (0-indexed row 2) because startrow=1 + 1 header
        data_start = 3
        data_end   = data_start + n_rows - 1

        # Apply number format to revenue and avg_order columns
        for r in range(data_start, data_end + 1):
            ws.write(r - 1, 1, by_region.iloc[r - data_start]["total_revenue"], currency_fmt)
            ws.write(r - 1, 3, by_region.iloc[r - data_start]["avg_order"],     currency_fmt)

        # Summary / totals row
        summary_row = data_end  # 0-indexed
        ws.write(summary_row, 0, "TOTAL", label_fmt)
        ws.write_formula(summary_row, 1,
                         f"=SUM(B{data_start}:B{data_end})", total_fmt)
        ws.write_formula(summary_row, 2,
                         f"=SUM(C{data_start}:C{data_end})", total_fmt)
        ws.write(summary_row, 3, "", total_fmt)

except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

print(f"Written with summary row: {OUTPUT}")

Step 4 — Conditional Formatting

Highlight underperforming regions (revenue below a threshold) in red and top performers in green — without touching a single cell manually.

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/sales_report_cf.xlsx")

try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary", index=False)
        wb = writer.book
        ws = writer.sheets["Summary"]

        green_fmt = wb.add_format({"bg_color": "#C6EFCE", "font_color": "#006100"})
        red_fmt   = wb.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})

        n = len(by_region)
        rev_range = f"B2:B{n + 1}"   # revenue column, skip header

        # Values above 10 000 → green
        ws.conditional_format(rev_range, {
            "type": "cell", "criteria": ">=", "value": 10_000, "format": green_fmt
        })
        # Values below 5 000 → red
        ws.conditional_format(rev_range, {
            "type": "cell", "criteria": "<", "value": 5_000, "format": red_fmt
        })

        # 3-color scale on units column
        ws.conditional_format(f"C2:C{n + 1}", {
            "type": "3_color_scale",
            "min_color": "#FFC7CE",
            "mid_color": "#FFEB9C",
            "max_color": "#C6EFCE",
        })

except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

print(f"Written: {OUTPUT}")

Step 5 — Embed a Bar Chart

Charts embedded in the workbook are generated from the data already on the sheet — no separate image file required.

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/sales_report_chart.xlsx")

try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary", index=False)
        wb = writer.book
        ws = writer.sheets["Summary"]

        n = len(by_region)

        chart = wb.add_chart({"type": "column"})
        chart.add_series({
            "name":       "Total Revenue",
            "categories": ["Summary", 1, 0, n, 0],   # region labels (col A)
            "values":     ["Summary", 1, 1, n, 1],   # revenue values (col B)
            "fill":       {"color": "#2563eb"},
            "gap":        50,
        })
        chart.set_title({"name": "Revenue by Region"})
        chart.set_x_axis({"name": "Region"})
        chart.set_y_axis({"name": "Revenue ($)", "num_format": "#,##0"})
        chart.set_legend({"none": True})
        chart.set_size({"width": 480, "height": 288})

        ws.insert_chart("F2", chart)

except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

print(f"Written: {OUTPUT}")

Edge Cases and Variants

Modifying an Existing Template with openpyxl

When you need to inject data into a pre-branded template, openpyxl is the right tool. Note that opening in read_only mode prevents writes — see Fix openpyxl Read-Only Mode Error if you hit that error.

# pip install openpyxl pandas
from pathlib import Path
import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.styles import Font, PatternFill, numbers

TEMPLATE = Path("templates/report_template.xlsx")
OUTPUT   = Path("output/report_filled.xlsx")

try:
    wb = load_workbook(TEMPLATE)           # NOT read_only — we need to write
    ws = wb["Data"]

    # Clear old data below header
    for row in ws.iter_rows(min_row=2, max_row=ws.max_row):
        for cell in row:
            cell.value = None

    # Write new data
    for r_idx, row in enumerate(dataframe_to_rows(by_region, index=False, header=False), start=2):
        for c_idx, value in enumerate(row, start=1):
            ws.cell(row=r_idx, column=c_idx, value=value)

    # Apply currency format to column B
    for cell in ws["B"][1:]:
        cell.number_format = numbers.FORMAT_NUMBER_COMMA_SEPARATED2

    wb.save(OUTPUT)
except FileNotFoundError as exc:
    raise SystemExit(f"Template not found: {exc}")

print(f"Template filled: {OUTPUT}")

Writing Excel Formulas and Charts Directly

For advanced formula patterns (array formulas, named ranges, dynamic arrays), see the Writing Excel Formulas and Charts with openpyxl guide.

Memory-Efficient Writes with `use_constant_memory`

For DataFrames with more than 500 000 rows, xlsxwriter's constant-memory mode streams rows without buffering the whole sheet in RAM. The trade-off: you cannot go back and set cell formats after writing.

# pip install pandas xlsxwriter
from pathlib import Path
import pandas as pd

OUTPUT = Path("output/large_report.xlsx")

options = {"constant_memory": True}

try:
    with pd.ExcelWriter(OUTPUT, engine="xlsxwriter",
                        engine_kwargs={"options": options}) as writer:
        # Write in chunks if the DataFrame is chunked from a database
        big_df.to_excel(writer, sheet_name="Data", index=False)
except Exception as exc:
    raise SystemExit(f"Export failed: {exc}")

Validation

Assert the output is structurally correct before treating the run as successful:

# pip install openpyxl
from pathlib import Path
from openpyxl import load_workbook

OUTPUT = Path("output/sales_report.xlsx")

wb = load_workbook(OUTPUT, read_only=True)
assert "Summary" in wb.sheetnames, "Summary sheet missing"
ws = wb["Summary"]
rows = list(ws.iter_rows(values_only=True))
assert len(rows) >= 2, "No data rows written"
assert rows[0][1] is not None, "Column B header missing"
wb.close()
print("Validation passed")

Performance and Scale

Row limit: Excel supports ~1 048 576 rows per sheet. For datasets beyond 100 000 rows consider writing to CSV and referencing it via Power Query instead of embedding all data in the workbook.
Multiple sheets: Each to_excel(..., sheet_name=...) call within the same ExcelWriter context adds a sheet without re-opening the file.
Formula recalculation: xlsxwriter writes formulas as strings; Excel recalculates on open. If you need values pre-calculated, write the computed value with xlsxwriter's write() instead of write_formula(), or use openpyxl with data_only=True after the file has been opened and saved by Excel once.
Parquet intermediary: If the ETL step is expensive, save the aggregated DataFrames to Parquet and load from there for the styling step — decouples compute from presentation.

Troubleshooting

Error	Root cause	Fix
`zipfile.BadZipFile`	Output file was left open by Excel when the script ran	Close the file in Excel first, or write to a temp path then rename
`KeyError: 'Summary'` when accessing `writer.sheets`	Sheet name not yet created — `to_excel` must be called before accessing `writer.sheets[name]`	Confirm `to_excel(..., sheet_name="Summary")` executes before the `writer.sheets` lookup
`IllegalCharacterError`	Cell value contains control characters (e.g. `\x00`) from a database	Strip with `df.replace(r"[\x00-\x1f]", "", regex=True)` before writing
`PermissionError` on save	Target `.xlsx` is open in another application	Write to a new timestamped filename or close the file
`AttributeError: 'Worksheet' object has no attribute 'add_format'`	Mixed `openpyxl`/`xlsxwriter` API — `add_format` is `xlsxwriter`-only	Use `openpyxl`'s `PatternFill`/`Font` objects when the engine is `openpyxl`

Complete Working Script

#!/usr/bin/env python3
# pip install pandas xlsxwriter openpyxl
"""
generate_sales_report.py — produce a formatted multi-sheet Excel report.
Usage: python generate_sales_report.py --input data/sales.csv --output output/report.xlsx
"""
import argparse
from pathlib import Path
import pandas as pd


def load_and_aggregate(csv_path: Path):
    df = pd.read_csv(csv_path)
    df["revenue"] = pd.to_numeric(df["revenue"], errors="coerce")
    df["units"]   = pd.to_numeric(df["units"],   errors="coerce")
    df.dropna(subset=["revenue", "units"], inplace=True)

    by_region = (
        df.groupby("region", as_index=False)
        .agg(total_revenue=("revenue", "sum"),
             total_units=("units", "sum"),
             transactions=("revenue", "count"))
    )
    by_region["avg_order"] = (
        by_region["total_revenue"] / by_region["transactions"]
    ).round(2)

    by_month = (
        df.groupby("month", as_index=False)
        .agg(total_revenue=("revenue", "sum"))
        .sort_values("month")
    )
    return by_region, by_month


def write_report(by_region: "pd.DataFrame", by_month: "pd.DataFrame",
                 output: Path) -> None:
    output.parent.mkdir(parents=True, exist_ok=True)

    with pd.ExcelWriter(output, engine="xlsxwriter") as writer:
        by_region.to_excel(writer, sheet_name="Summary",       index=False)
        by_month.to_excel( writer, sheet_name="Monthly Trend", index=False)

        wb = writer.book

        hdr_fmt      = wb.add_format({"bold": True, "bg_color": "#2563eb",
                                      "font_color": "#ffffff", "font_size": 12})
        currency_fmt = wb.add_format({"num_format": "#,##0.00"})
        green_fmt    = wb.add_format({"bg_color": "#C6EFCE", "font_color": "#006100"})
        red_fmt      = wb.add_format({"bg_color": "#FFC7CE", "font_color": "#9C0006"})

        # --- Summary sheet ---
        ws = writer.sheets["Summary"]
        for c, name in enumerate(by_region.columns):
            ws.write(0, c, name, hdr_fmt)
        n = len(by_region)
        ws.set_column("A:A", 14)
        ws.set_column("B:D", 16, currency_fmt)
        ws.conditional_format(f"B2:B{n+1}", {
            "type": "cell", "criteria": ">=", "value": 10_000, "format": green_fmt
        })
        ws.conditional_format(f"B2:B{n+1}", {
            "type": "cell", "criteria": "<",  "value": 5_000,  "format": red_fmt
        })
        # Total row
        ws.write(n + 1, 0, "TOTAL",
                 wb.add_format({"bold": True, "bg_color": "#f1f5f9"}))
        ws.write_formula(n + 1, 1, f"=SUM(B2:B{n+1})",
                         wb.add_format({"num_format": "#,##0.00", "bold": True,
                                        "bg_color": "#f1f5f9"}))

        # Bar chart
        chart = wb.add_chart({"type": "column"})
        chart.add_series({
            "name":       "Revenue",
            "categories": ["Summary", 1, 0, n, 0],
            "values":     ["Summary", 1, 1, n, 1],
            "fill":       {"color": "#2563eb"},
        })
        chart.set_title({"name": "Revenue by Region"})
        chart.set_y_axis({"num_format": "#,##0"})
        chart.set_legend({"none": True})
        chart.set_size({"width": 400, "height": 240})
        ws.insert_chart("F2", chart)

        # --- Monthly Trend sheet ---
        ws2 = writer.sheets["Monthly Trend"]
        for c, name in enumerate(by_month.columns):
            ws2.write(0, c, name, hdr_fmt)
        ws2.set_column("A:B", 18, currency_fmt)


def main():
    parser = argparse.ArgumentParser(description="Generate formatted Excel sales report")
    parser.add_argument("--input",  default="data/sales.csv",     help="Input CSV path")
    parser.add_argument("--output", default="output/report.xlsx", help="Output .xlsx path")
    args = parser.parse_args()

    csv_path = Path(args.input)
    out_path = Path(args.output)

    if not csv_path.exists():
        raise SystemExit(f"Input not found: {csv_path}")

    try:
        by_region, by_month = load_and_aggregate(csv_path)
        write_report(by_region, by_month, out_path)
        print(f"Report written to {out_path}")
    except PermissionError:
        raise SystemExit(f"Cannot write {out_path} — close it in Excel first")
    except Exception as exc:
        raise SystemExit(f"Report generation failed: {exc}")


if __name__ == "__main__":
    main()

FAQ

Which engine should I use — openpyxl or xlsxwriter? Use xlsxwriter when creating a new file with heavy formatting and charts — it is faster and has a richer format API. Use openpyxl when loading and modifying an existing .xlsx file. You cannot use xlsxwriter to read or modify existing files.

How do I add a second header row (e.g. a report title above the column headers)? Pass startrow=2 to to_excel, then use ws.merge_range("A1:D1", "Monthly Report", title_fmt) before writing the data. The column header row lands at row 2 (0-indexed row 1).

Why do my formulas show as 0 when I open the file?xlsxwriter writes formulas as strings and relies on Excel to calculate on open. If you need pre-calculated values, write the Python-computed value directly with ws.write(row, col, value, fmt). Alternatively, open the file in openpyxl with data_only=False, then save — Excel will recalculate on next open.

Can I password-protect the generated report?xlsxwriter supports workbook-level protection: wb.set_properties(...) and ws.protect("password"). For PDF output with password protection, see the Watermarking and Securing PDFs guide.

How do I schedule this to run automatically? On Linux/macOS, add a cron entry: 0 7 1 * * /path/.venv/bin/python /path/generate_sales_report.py. See Automating Monthly Sales Reports in Excel for a full scheduling and delivery walkthrough.

Automating Monthly Sales Reports in Excel — end-to-end scheduled report with multi-sheet groupby output
Writing Excel Formulas and Charts with openpyxl — advanced formula patterns and chart types
Reading Excel Files with Python — ingestion patterns before the report generation step
Fix openpyxl Read-Only Mode Error — resolve write errors when loading an existing workbook

Part of Python for Excel & CSV Data Processing.

Explore next

Automating Monthly Sales Reports in Excel Fix openpyxl Read-Only Mode Error