Data Readiness for a Smooth AI Evaluation Process

How to Assemble, Cleanse, and Format Data for Reliable AI Models

Artificial intelligence is not just about algorithms — it's about data readiness. For utilities, implementing AI in finance begins with assembling data from multiple systems, cleansing it for consistency, and formatting it so the AI can "see" relationships across accounting, operations, and engineering. The steps below outline how to prepare your organization's data so that AI models can produce accurate and actionable insights.

Five-Stage Data Pipeline for AI Implementation

STAGE 1 Assemble Data ERP • WMS • CIS AMS • SCADA STAGE 2 Cleanse Data Remove duplicates Fix errors STAGE 3 Normalize Standard formats Consistent keys STAGE 4 Structure CSV • SQL • Parquet Relational model STAGE 5 Automate Scheduled feeds Continuous updates Common Data Sources • ERP/Accounting (GL, budgets) • Work Management (work orders) • CIS (billing, usage, payments) • Asset Management (depreciation) • SCADA/AMI (operational data) • Regulatory records (grants, RUS) Common Linking Keys • Work order number • GL account number • Asset or project ID • Customer/service location • Date/time stamps Preferred AI-Ready Formats • CSV/Excel (pilot projects) • SQL databases (continuous training) • Parquet files (data lakes) • Power BI dataflows (dashboards) • JSON/API (real-time feeds) • Cloud pipelines (Azure, AWS)

1. Assembling the Data

The first step is to bring together all data influencing financial performance — often spread across different utility systems.

System Examples of Data Needed Typical Source Format
ERP / Accounting System General ledger transactions, journal entries, cost centers, budgets CSV export, SQL query, or API (SAP, Munis, Tyler, etc.)
Work Management System (WMS) Work orders, labor and materials, project status Excel/CSV, API, or database
Customer Information System (CIS) Billing, usage, payment history, rate class SQL, CSV, or JSON
Asset Management System (AMS) Asset ID, installation date, cost, depreciation schedule CSV, EAM export, or integration feed
Operational Systems (SCADA, OMS, AMI) Energy output, outage durations, meter data, temperature CSV, XML, or API
Regulatory / Grant Records FEMA project numbers, reimbursement documentation, RUS forms PDF + structured index (OCR or metadata extraction)

Once assembled, merge data around common keys such as work order number, GL account number, asset or project ID, and customer or service location number. These identifiers connect engineering activity with accounting outcomes — for instance, linking a feeder upgrade work order to depreciation and CIAC accounting entries.

2. Cleansing and Normalizing the Data

AI performance depends on data quality. Cleansing ensures your data is consistent, complete, and ready for model training.

Data Issue Common Example Cleansing Action
Inconsistent account names "Plant Additions" vs "Plant Addition" Apply controlled vocabulary (FERC/RUS USoA)
Duplicate work orders Same project entered twice De-duplicate using unique work order number
Missing or invalid dates "1/0/2020" or blank Infer missing data using nearest valid entry
Mis-categorized costs Engineering labor coded to materials Rule-based or ML-based reclassification
Non-numeric fields "$1,000 (est.)" Convert to numeric and remove special characters

The goal is to output every dataset in a machine-readable, tabular format — typically CSV, Parquet, or structured database tables. Think of the end product as a data model, where tables such as "Work Orders," "GL Transactions," and "Assets" share defined relationships.

Master AI Implementation for Utilities

Get comprehensive training on preparing data, implementing AI tools, and transforming your finance operations. Our AI Skills Learning Path covers everything from foundations to advanced applications.

Explore the AI Skills Learning Path

3. Preferred Formats for AI Training and Analysis

After cleansing, structure your data in a consistent, relational format for long-term use. Common storage and integration formats include:

Format / Platform Best For Notes
CSV / Excel Tables Initial model training, simple datasets Ideal for pilot projects and POCs
SQL Database (PostgreSQL, SQL Server) Continuous model training and dashboards Enables queries and version control
Parquet Files (Data Lake) Large data storage for AI/ML Scalable, efficient, cloud-ready
Power BI Dataflows / Models Visualization + Copilot/AI integration Works natively with Microsoft AI tools
JSON / API Feeds Real-time integration with live systems Supports continuous AI retraining

4. Automating and Updating the Data

AI delivers the best results when fed regular, automated data updates. Set up scheduled feeds and automations using nightly or weekly exports from ERP, WMS, or CIS; SQL connectors or Power BI dataflows; Robotic Process Automation (RPA) for legacy systems; and cloud data pipelines such as Azure Data Factory or AWS Glue.

These processes evolve into a financial data lake — a living repository feeding AI dashboards, forecasts, and automated variance explanations.

Key Takeaway

AI implementation succeeds when utilities treat data as a strategic asset, not just a byproduct of accounting and operations. Consistent formats, validated records, and unified identifiers create the foundation for reliable automation, predictive modeling, and intelligent financial reporting.

About Russ Hissom – Article Author

Russ Hissom, CPA is a principal of Utility Accounting & Rates Specialists, a firm providing cost-of-service and rate studies, expert witness testimony, and consulting services to electric, gas, water, wastewater, and broadband utilities. Russ also leads UtilityEducation.com, an online training platform offering NASBA-registered CPE courses in accounting, rates, construction accounting, financial analysis, and AI applications for utilities.

Learn more at uarsconsulting.com or contact Russ at russ.hissom@uarsconsulting.com.

Next
Next

Develop Your AI Security Policy