database seedingmysql seederfaker jsnode js clideveloper toolsautomationdatabase testingfake data generationsaas developmentfull stack developmentcli tools

Building a Schema-Aware Database Seeder CLI: Going Beyond Random Data

April 28, 20262 min read

Seeding a database is easy, but generating realistic and structured data is not. This CLI tool takes a different approach by using schema introspection and naming conventions to automatically produce meaningful, context-aware data for real-world development.

Building a Schema-Aware Database Seeder CLI: Going Beyond Random Data

Seeding a database is easy.

Seeding it with useful, realistic, and structured data is not.

After dealing with repetitive seeders across SaaS and ERP projects, I built a CLI tool that does something slightly different:

It generates data based on column naming conventions and database types, not just randomness.

Repo: database-seeder-CLI

The Problem with Most Seeders

Typical approaches fall into two categories:

Manual seeders → accurate but slow and repetitive
Random generators → fast but unrealistic

The issue is not generating data.

It is generating data that:

Matches your schema
Feels realistic
Works across multiple tables
Scales with your database structure

The Approach: Schema-Driven Seeding

Instead of hardcoding values, the CLI inspects your database directly:

SHOW TABLES → DESCRIBE table → infer columns → generate data

From there, it applies two layers of logic:

Column name heuristics (primary signal)
SQL type fallbacks (secondary safety net)

Core Idea: Naming Conventions as Signals

The tool treats column names as intent.

if (field.includes(&#39;email&#39;)) return faker.internet.email();
if (field.includes(&#39;price&#39;)) return faker.finance.amount();
if (field.includes(&#39;slug&#39;)) return faker.helpers.slugify(...);

This simple pattern unlocks a lot:

email → real emails
avatar → working image URLs
tech_stack → JSON arrays of technologies
tags → structured arrays instead of strings

This makes seed data actually usable for UI, APIs, and testing.

Type-Based Fallbacks

When naming is not enough, the CLI falls back to SQL types:

if (/^(int|bigint)/.test(type)) return faker.number.int();
if (/^(decimal|float)/.test(type)) return faker.finance.amount();
if (type.startsWith(&#39;date&#39;)) return faker.date.past();

This ensures:

No column is left unhandled
Data stays consistent with schema constraints
Inserts don’t break due to invalid formats

Rich Text Detection (Underrated Feature)

One interesting piece is rich-text handling.

const RICH_TEXT_TYPES = ['longtext', 'mediumtext'];

If detected, the CLI can generate Quill-compatible HTML:

<h2> headings
<p> paragraphs
<strong> formatting
<ul> lists

This is a big deal for:

CMS systems
Blog platforms
SaaS dashboards

Because plain lorem text is not enough when your UI expects formatted content.

Interactive CLI Flow

Instead of hardcoding tables, the tool uses an interactive flow:

Select tables (multi-select)
Choose row count
Detect rich-text fields and ask for HTML mode

const selectedTables = await checkbox(...)
const rowCount = await input(...)
const useHtml = await confirm(...)

This makes it flexible across projects without changing code.

Insert Strategy (Important Detail)

The seeding logic is not just naive bulk insert.

It:

Skips auto-increment fields
Filters out null values
Handles row-level failures without stopping the process

await conn.execute(filteredSQL, filteredValues);

This is important for real-world usage where:

Constraints exist
Some rows might fail
You don’t want the entire seed process to crash

Why This Actually Matters

This is not just a convenience tool.

It improves:

1. Developer Experience

Spin up realistic environments instantly.

2. Frontend Testing

UI components behave properly with real-looking data.

3. API Validation

Endpoints return meaningful payloads.

4. System Simulation

Closer to production-like scenarios without real data.

Trade-Offs

This approach is not perfect.

Naming Dependency

It assumes your schema follows good conventions.

Limited Domain Awareness

It does not understand business rules deeply.

Relationships Still Need Care

Foreign keys and relational consistency are not fully inferred.

Where This Fits Best

This tool is ideal for:

SaaS development
Admin dashboards
Rapid prototyping
Internal tools
Developer onboarding

It is less suited for:

Complex relational simulations
Domain-heavy test scenarios

Engineering Takeaway

This project highlights something simple but important:

Good developer tools are not about complexity. They are about removing friction intelligently.

Instead of writing more code, this approach uses:

Schema introspection
Naming conventions
Smart defaults

To automate a task developers do constantly.

Final Thought

Seeding is often treated as a minor task.

But in practice, it directly affects how fast you can build, test, and iterate.

A small improvement here compounds across every project.

And sometimes, that is where the most practical engineering wins happen.

Author

Jose Albert Arnedo

Full-Stack Engineer focused on ERP systems and SaaS platforms