Technical guide

Complete Tutorial: Build a Dart CLI like gedcom_matcher from scratch

This guide walks you through creating a production-ready Dart command-line tool that compares two GEDCOM files and outputs likely person matches with confidence scoring, much like gedcom_matcher.

You will build:

  1. A reusable core library (parser adapter, normalizer, matcher, formatter).
  2. A CLI interface with robust argument parsing.
  3. Automated tests.
  4. Multi-format output (table, JSON, CSV, Markdown).
  5. A clean package API that can be used both from terminal and from Dart code.

1. Prerequisites

Install Dart SDK and verify:

dart --version

Recommended:

  • Dart 3.10+.
  • Git.
  • Optional: FVM if you manage multiple SDKs.

Official docs:

2. Create the package skeleton

Create a console package:

dart create -t console-full gedcom_matcher_like
cd gedcom_matcher_like

This gives you:

  • A package manifest.
  • A bin entry point.
  • Initial tests.

3. Add dependencies

You need:

  • args for CLI options.
  • gedcom_parser for GEDCOM parsing.
  • test and lints in dev dependencies.
dart pub add args
dart pub add gedcom_parser
dart pub add dev:test
dart pub add dev:lints
dart pub get

Sources:

4. Define your architecture first

Use this layered structure:

  • Core domain models.
  • GEDCOM adapter.
  • Text normalization.
  • Matching/scoring engine.
  • Output formatter.
  • CLI orchestrator.

Design principle:

  • Keep terminal parsing separate from matching logic.
  • Make matching logic fully testable without filesystem access.

5. Build domain models

Create immutable data classes for people and match results.

enum OutputFormat { table, json, csv, markdown }

class PersonRecord {
    const PersonRecord({
        required this.id,
        required this.givenName,
        required this.surname,
        required this.sex,
        this.birthDate,
        this.birthPlace,
        this.deathDate,
        this.deathPlace,
        this.spouseName,
    });

    final String id;
    final String givenName;
    final String surname;
    final String sex;
    final String? birthDate;
    final String? birthPlace;
    final String? deathDate;
    final String? deathPlace;
    final String? spouseName;

    String get fullName => ('$givenName $surname').trim();
}

class MatchOptions {
    const MatchOptions({
        this.minConfidence = 0,
        this.weightName = 40,
        this.weightBirthDate = 20,
        this.weightBirthPlace = 10,
        this.weightDeathDate = 10,
        this.weightSex = 10,
        this.weightSpouse = 10,
        this.maxCandidates,
    });

    final int minConfidence;
    final int weightName;
    final int weightBirthDate;
    final int weightBirthPlace;
    final int weightDeathDate;
    final int weightSex;
    final int weightSpouse;
    final int? maxCandidates;
}

class MatchResult {
    const MatchResult({
        required this.left,
        required this.right,
        required this.confidence,
        required this.reasons,
    });

    final PersonRecord left;
    final PersonRecord right;
    final int confidence;
    final List<String> reasons;
}

6. Build GEDCOM parsing adapter

Wrap gedcom_parser so your app depends on your own internal model, not on external package types directly.

import 'package:gedcom_parser/gedcom_parser.dart' as gp;

class GedcomAdapter {
    const GedcomAdapter();

    List<PersonRecord> parse(String content) {
        final parser = gp.GedcomParser();
        final lines = content.split(RegExp(r'\r?\n'));
        final data = parser.parseLines(lines);

        final recordsById = <String, PersonRecord>{};
        for (final entry in data.persons.entries) {
            final p = entry.value;
            recordsById[entry.key] = PersonRecord(
                id: p.id,
                givenName: p.firstName,
                surname: p.lastName,
                sex: p.sex,
                birthDate: p.birthDate,
                birthPlace: p.birthPlace,
                deathDate: p.deathDate,
                deathPlace: p.deathPlace,
            );
        }

        final spouseNamesById = <String, Set<String>>{};
        for (final family in data.families.values) {
            _link(spouseNamesById, recordsById, family.husbandId, family.wifeId);
            _link(spouseNamesById, recordsById, family.wifeId, family.husbandId);
        }

        return recordsById.values.map((record) {
            final spouseNames = spouseNamesById[record.id];
            return PersonRecord(
                id: record.id,
                givenName: record.givenName,
                surname: record.surname,
                sex: record.sex,
                birthDate: record.birthDate,
                birthPlace: record.birthPlace,
                deathDate: record.deathDate,
                deathPlace: record.deathPlace,
                spouseName: spouseNames == null || spouseNames.isEmpty
                        ? null
                        : (spouseNames.toList()..sort()).join(', '),
            );
        }).toList(growable: false);
    }

    void _link(
        Map<String, Set<String>> spouseNamesById,
        Map<String, PersonRecord> recordsById,
        String? personId,
        String? spouseId,
    ) {
        if (personId == null || spouseId == null) return;
        final spouse = recordsById[spouseId];
        if (spouse == null || spouse.fullName.isEmpty) return;
        spouseNamesById.putIfAbsent(personId, () => <String>{}).add(spouse.fullName);
    }
}

Why this adapter matters:

  • Easy to swap parser implementation later.
  • Keeps matching logic stable even if parser package changes.

7. Add text normalization

Genealogy data is noisy. Normalize names and places before scoring.

class Normalizer {
    const Normalizer();

    String normalizeText(String? input) {
        if (input == null) return '';
        final lower = input.toLowerCase().trim();
        final noPunct = lower.replaceAll(RegExp(r'[^a-z0-9\s]'), ' ');
        return noPunct.replaceAll(RegExp(r'\s+'), ' ').trim();
    }

    String normalizeDate(String? input) {
        if (input == null) return '';
        return normalizeText(input);
    }
}

You can later improve this with diacritic removal, nickname maps, and phonetic algorithms.

8. Implement matching engine

A practical weighted matcher:

  • Computes per-field signal.
  • Applies configurable weights.
  • Produces reasons for explainability.
  • Filters by min confidence.
class GedcomMatcher {
    const GedcomMatcher({this.normalizer = const Normalizer()});

    final Normalizer normalizer;

    List<MatchResult> match({
        required List<PersonRecord> leftPeople,
        required List<PersonRecord> rightPeople,
        MatchOptions options = const MatchOptions(),
    }) {
        final results = <MatchResult>[];

        for (final left in leftPeople) {
            final candidates = options.maxCandidates == null
                    ? rightPeople
                    : rightPeople.take(options.maxCandidates!).toList(growable: false);

            for (final right in candidates) {
                final scored = _score(left, right, options);
                if (scored.confidence >= options.minConfidence) {
                    results.add(scored);
                }
            }
        }

        results.sort((a, b) => b.confidence.compareTo(a.confidence));
        return results;
    }

    MatchResult _score(
        PersonRecord a,
        PersonRecord b,
        MatchOptions o,
    ) {
        var maxWeight = 0;
        var points = 0;
        final reasons = <String>[];

        void addSignal({
            required int weight,
            required bool matches,
            required String reason,
        }) {
            maxWeight += weight;
            if (matches) {
                points += weight;
                reasons.add(reason);
            }
        }

        final aName = normalizer.normalizeText(a.fullName);
        final bName = normalizer.normalizeText(b.fullName);
        addSignal(
            weight: o.weightName,
            matches: aName.isNotEmpty && aName == bName,
            reason: 'Name matches',
        );

        addSignal(
            weight: o.weightBirthDate,
            matches: normalizer.normalizeDate(a.birthDate) ==
                    normalizer.normalizeDate(b.birthDate) &&
                    normalizer.normalizeDate(a.birthDate).isNotEmpty,
            reason: 'Birth date matches',
        );

        addSignal(
            weight: o.weightBirthPlace,
            matches: normalizer.normalizeText(a.birthPlace) ==
                    normalizer.normalizeText(b.birthPlace) &&
                    normalizer.normalizeText(a.birthPlace).isNotEmpty,
            reason: 'Birth place matches',
        );

        addSignal(
            weight: o.weightDeathDate,
            matches: normalizer.normalizeDate(a.deathDate) ==
                    normalizer.normalizeDate(b.deathDate) &&
                    normalizer.normalizeDate(a.deathDate).isNotEmpty,
            reason: 'Death date matches',
        );

        addSignal(
            weight: o.weightSex,
            matches: normalizer.normalizeText(a.sex) == normalizer.normalizeText(b.sex) &&
                    normalizer.normalizeText(a.sex).isNotEmpty,
            reason: 'Sex matches',
        );

        addSignal(
            weight: o.weightSpouse,
            matches: normalizer.normalizeText(a.spouseName) ==
                    normalizer.normalizeText(b.spouseName) &&
                    normalizer.normalizeText(a.spouseName).isNotEmpty,
            reason: 'Spouse matches',
        );

        final confidence = maxWeight == 0 ? 0 : ((points * 100) / maxWeight).round();

        return MatchResult(
            left: a,
            right: b,
            confidence: confidence,
            reasons: reasons,
        );
    }
}

9. Implement output formatter

You want one source of truth for rendering:

  • table for humans.
  • json/csv/markdown for data export and pipelines.
import 'dart:convert';

class MatchOutputFormatter {
    const MatchOutputFormatter();

    String format(List<MatchResult> matches, OutputFormat format) {
        switch (format) {
            case OutputFormat.table:
                return _table(matches);
            case OutputFormat.json:
                return _json(matches);
            case OutputFormat.csv:
                return _csv(matches);
            case OutputFormat.markdown:
                return _markdown(matches);
        }
    }

    String _table(List<MatchResult> matches) {
        final buffer = StringBuffer();
        buffer.writeln('Confidence | Left Person | Right Person | Reasons');
        buffer.writeln('---------- | ----------- | ------------ | -------');
        for (final m in matches) {
            buffer.writeln(
                '${m.confidence.toString().padLeft(3)}% | '
                '${m.left.fullName} | '
                '${m.right.fullName} | '
                '${m.reasons.join('; ')}',
            );
        }
        return buffer.toString();
    }

    String _json(List<MatchResult> matches) {
        final jsonList = matches.map((m) {
            return {
                'confidence': m.confidence,
                'left': {'id': m.left.id, 'name': m.left.fullName},
                'right': {'id': m.right.id, 'name': m.right.fullName},
                'reasons': m.reasons,
            };
        }).toList(growable: false);

        return const JsonEncoder.withIndent('  ').convert(jsonList);
    }

    String _csv(List<MatchResult> matches) {
        final rows = <String>[
            'confidence,left_id,left_name,right_id,right_name,reasons',
        ];
        for (final m in matches) {
            rows.add(
                '${m.confidence},'
                '${_escape(m.left.id)},'
                '${_escape(m.left.fullName)},'
                '${_escape(m.right.id)},'
                '${_escape(m.right.fullName)},'
                '${_escape(m.reasons.join('|'))}',
            );
        }
        return rows.join('\n');
    }

    String _markdown(List<MatchResult> matches) {
        final rows = <String>[
            '| Confidence | Left | Right | Reasons |',
            '|---:|---|---|---|',
        ];
        for (final m in matches) {
            rows.add(
                '| ${m.confidence}% | '
                '${m.left.fullName} | '
                '${m.right.fullName} | '
                '${m.reasons.join(', ')} |',
            );
        }
        return rows.join('\n');
    }

    String _escape(String value) {
        final q = value.replaceAll('"', '""');
        return '"$q"';
    }
}

10. Build CLI argument parsing with args

Example CLI entry logic:

  • Parse options.
  • Validate required positional args.
  • Read input files.
  • Parse and match.
  • Render/export.
import 'dart:io';
import 'package:args/args.dart';

void main(List<String> arguments) {
    final parser = ArgParser()
        ..addMultiOption(
            'format',
            abbr: 'f',
            allowed: ['table', 'json', 'csv', 'markdown'],
            defaultsTo: ['table'],
        )
        ..addOption('output', abbr: 'o')
        ..addOption('min-confidence', defaultsTo: '0')
        ..addOption('max-candidates')
        ..addFlag('no-color', negatable: false)
        ..addFlag('help', abbr: 'h', negatable: false);

    late ArgResults args;
    try {
        args = parser.parse(arguments);
    } catch (e) {
        stderr.writeln('Argument error: $e');
        stderr.writeln(parser.usage);
        exitCode = 64;
        return;
    }

    if (args['help'] == true) {
        stdout.writeln('GEDCOM matcher-like CLI');
        stdout.writeln(parser.usage);
        return;
    }

    if (args.rest.length != 2) {
        stderr.writeln('Expected two GEDCOM files: left right');
        stderr.writeln(parser.usage);
        exitCode = 64;
        return;
    }
}

11. Export public API cleanly

Expose only what users need:

  • Adapter.
  • Matcher.
  • Models.
  • Formatter.

Keep internals private and avoid exposing implementation details.

12. Add deterministic tests

Use fixtures and focused unit tests:

  1. Parser adapter test with minimal GEDCOM input.
  2. Normalizer test.
  3. Matching test with known confidence.
  4. Formatter snapshots for each output format.
  5. CLI behavior tests (help, bad args, happy path).

Example matcher test:

import 'package:test/test.dart';

void main() {
    test('exact name and birth date yield strong score', () {
        const matcher = GedcomMatcher();

        const left = PersonRecord(
            id: 'A1',
            givenName: 'John',
            surname: 'Martin',
            sex: 'M',
            birthDate: '12 JAN 1900',
            birthPlace: 'Lyon',
        );

        const right = PersonRecord(
            id: 'B1',
            givenName: 'John',
            surname: 'Martin',
            sex: 'M',
            birthDate: '12 JAN 1900',
            birthPlace: 'Lyon',
        );

        final results = matcher.match(
            leftPeople: const [left],
            rightPeople: const [right],
            options: const MatchOptions(minConfidence: 1),
        );

        expect(results, hasLength(1));
        expect(results.first.confidence, greaterThanOrEqualTo(70));
    });
}

Run quality gates:

dart format .
dart analyze
dart test

Source: Dart testing tutorial section

13. Run and use your CLI

Typical commands:

dart run --help
dart run path/to/a.ged path/to/b.ged
dart run --format json --output matches.json path/to/a.ged path/to/b.ged
dart run --format table --format markdown --min-confidence 70 path/to/a.ged path/to/b.ged

Key UX tips:

  • Return process exit codes.
  • Print actionable argument errors.
  • Keep default output human-readable.
  • Make structured outputs machine-friendly.

14. Hardening for real-world genealogy data

Add these improvements progressively:

  1. Candidate blocking: use surname initials or birth year buckets to avoid full cartesian comparisons.
  2. Fuzzy scoring: add partial scores for near names and near dates.
  3. Explainability: keep detailed score breakdown to justify each match.
  4. Performance: stream large files where possible and avoid repeated normalization.
  5. Safety: validate that all user numeric inputs are in expected ranges.

15. Packaging and publishing

Before publishing:

  1. Bump version.
  2. Update changelog.
  3. Run format, analyze, test.
  4. Dry run publish.
  5. Publish.
dart pub publish --dry-run
dart pub publish

References:

16. Sources and hyperlinks

Primary documentation:

  1. https://dart.dev/learn/tutorial
  2. https://dart.dev/tools/dart-run
  3. https://pub.dev/packages/args
  4. https://pub.dev/packages/test
  5. https://pub.dev/packages/lints
  6. https://pub.dev/packages/gedcom_parser

Project-specific references used:

  1. README.md
  2. pubspec.yaml
  3. lib/src/gedcom_parser.dart

17. Promotional closing: gedcom_matcher and gecdom_parser

If your goal is to move fast without reinventing everything, your current stack is already very strong.

gedcom_matcher:

  • It is a practical, configurable CLI and Dart library for confidence-based GEDCOM person matching.
  • It already supports weighted heuristics, threshold filtering, multiple output formats, and export workflows.
  • It is suitable both for command-line usage and integration in larger Dart applications.

About gecdom_parser:

  • The package name on pub.dev is gedcom_parser.
  • It provides robust GEDCOM parsing capabilities and is exactly the kind of reliable foundation you want for a matching tool.
  • In your project, this integration is cleanly handled in lib/src/gedcom_parser.dart, which is a great architecture choice because it isolates parser dependency from your domain logic.