All posts
database seedingmysql seederfaker jsnode js clideveloper toolsautomationdatabase testingfake data generationsaas developmentfull stack developmentcli tools

Building a Schema-Aware Database Seeder CLI: Going Beyond Random Data

April 28, 20262 min read
Share

Seeding a database is easy, but generating realistic and structured data is not. This CLI tool takes a different approach by using schema introspection and naming conventions to automatically produce meaningful, context-aware data for real-world development.

Building a Schema-Aware Database Seeder CLI: Going Beyond Random Data

Seeding a database is easy.

Seeding it with useful, realistic, and structured data is not.

After dealing with repetitive seeders across SaaS and ERP projects, I built a CLI tool that does something slightly different:

It generates data based on column naming conventions and database types, not just randomness.

Repo: database-seeder-CLI

The Problem with Most Seeders

Typical approaches fall into two categories:

  • Manual seeders → accurate but slow and repetitive
  • Random generators → fast but unrealistic

The issue is not generating data.

It is generating data that:

  • Matches your schema
  • Feels realistic
  • Works across multiple tables
  • Scales with your database structure

The Approach: Schema-Driven Seeding

Instead of hardcoding values, the CLI inspects your database directly:

JS
SHOW TABLES → DESCRIBE table → infer columns → generate data

From there, it applies two layers of logic:

  1. Column name heuristics (primary signal)
  2. SQL type fallbacks (secondary safety net)

Core Idea: Naming Conventions as Signals

The tool treats column names as intent.

JS
if (field.includes('email')) return faker.internet.email();
if (field.includes('price')) return faker.finance.amount();
if (field.includes('slug')) return faker.helpers.slugify(...);

This simple pattern unlocks a lot:

  • email → real emails
  • avatar → working image URLs
  • tech_stack → JSON arrays of technologies
  • tags → structured arrays instead of strings

This makes seed data actually usable for UI, APIs, and testing.

Type-Based Fallbacks

When naming is not enough, the CLI falls back to SQL types:

JS
if (/^(int|bigint)/.test(type)) return faker.number.int();
if (/^(decimal|float)/.test(type)) return faker.finance.amount();
if (type.startsWith('date')) return faker.date.past();

This ensures:

  • No column is left unhandled
  • Data stays consistent with schema constraints
  • Inserts don’t break due to invalid formats

Rich Text Detection (Underrated Feature)

One interesting piece is rich-text handling.

JS
const RICH_TEXT_TYPES = ['longtext', 'mediumtext'];

If detected, the CLI can generate Quill-compatible HTML:

  • <h2> headings
  • <p> paragraphs
  • <strong> formatting
  • <ul> lists

This is a big deal for:

  • CMS systems
  • Blog platforms
  • SaaS dashboards

Because plain lorem text is not enough when your UI expects formatted content.

Interactive CLI Flow

Instead of hardcoding tables, the tool uses an interactive flow:

  1. Select tables (multi-select)
  2. Choose row count
  3. Detect rich-text fields and ask for HTML mode
JS
const selectedTables = await checkbox(...)
const rowCount = await input(...)
const useHtml = await confirm(...)

This makes it flexible across projects without changing code.

Insert Strategy (Important Detail)

The seeding logic is not just naive bulk insert.

It:

  • Skips auto-increment fields
  • Filters out null values
  • Handles row-level failures without stopping the process
JS
await conn.execute(filteredSQL, filteredValues);

This is important for real-world usage where:

  • Constraints exist
  • Some rows might fail
  • You don’t want the entire seed process to crash

Why This Actually Matters

This is not just a convenience tool.

It improves:

1. Developer Experience

Spin up realistic environments instantly.

2. Frontend Testing

UI components behave properly with real-looking data.

3. API Validation

Endpoints return meaningful payloads.

4. System Simulation

Closer to production-like scenarios without real data.

Trade-Offs

This approach is not perfect.

Naming Dependency

It assumes your schema follows good conventions.

Limited Domain Awareness

It does not understand business rules deeply.

Relationships Still Need Care

Foreign keys and relational consistency are not fully inferred.

Where This Fits Best

This tool is ideal for:

  • SaaS development
  • Admin dashboards
  • Rapid prototyping
  • Internal tools
  • Developer onboarding

It is less suited for:

  • Complex relational simulations
  • Domain-heavy test scenarios

Engineering Takeaway

This project highlights something simple but important:

Good developer tools are not about complexity. They are about removing friction intelligently.

Instead of writing more code, this approach uses:

  • Schema introspection
  • Naming conventions
  • Smart defaults

To automate a task developers do constantly.

Final Thought

Seeding is often treated as a minor task.

But in practice, it directly affects how fast you can build, test, and iterate.

A small improvement here compounds across every project.

And sometimes, that is where the most practical engineering wins happen.

Author

Jose Albert Arnedo

Full-Stack Engineer focused on ERP systems and SaaS platforms

Share