1. Prerequisites
Install Dart SDK and verify:
dart --version
Recommended:
- Dart 3.10+.
- Git.
- Optional: FVM if you manage multiple SDKs.
Official docs:
2. Create the package skeleton
Create a console package:
dart create -t console-full gedcom_matcher_like cd gedcom_matcher_like
This gives you:
- A package manifest.
- A bin entry point.
- Initial tests.
3. Add dependencies
You need:
- args for CLI options.
- gedcom_parser for GEDCOM parsing.
- test and lints in dev dependencies.
dart pub add args dart pub add gedcom_parser dart pub add dev:test dart pub add dev:lints dart pub get
Sources:
4. Define your architecture first
Use this layered structure:
- Core domain models.
- GEDCOM adapter.
- Text normalization.
- Matching/scoring engine.
- Output formatter.
- CLI orchestrator.
Design principle:
- Keep terminal parsing separate from matching logic.
- Make matching logic fully testable without filesystem access.
5. Build domain models
Create immutable data classes for people and match results.
enum OutputFormat { table, json, csv, markdown }
class PersonRecord {
const PersonRecord({
required this.id,
required this.givenName,
required this.surname,
required this.sex,
this.birthDate,
this.birthPlace,
this.deathDate,
this.deathPlace,
this.spouseName,
});
final String id;
final String givenName;
final String surname;
final String sex;
final String? birthDate;
final String? birthPlace;
final String? deathDate;
final String? deathPlace;
final String? spouseName;
String get fullName => ('$givenName $surname').trim();
}
class MatchOptions {
const MatchOptions({
this.minConfidence = 0,
this.weightName = 40,
this.weightBirthDate = 20,
this.weightBirthPlace = 10,
this.weightDeathDate = 10,
this.weightSex = 10,
this.weightSpouse = 10,
this.maxCandidates,
});
final int minConfidence;
final int weightName;
final int weightBirthDate;
final int weightBirthPlace;
final int weightDeathDate;
final int weightSex;
final int weightSpouse;
final int? maxCandidates;
}
class MatchResult {
const MatchResult({
required this.left,
required this.right,
required this.confidence,
required this.reasons,
});
final PersonRecord left;
final PersonRecord right;
final int confidence;
final List<String> reasons;
}
6. Build GEDCOM parsing adapter
Wrap gedcom_parser so your app depends on your own internal model, not on external package types directly.
import 'package:gedcom_parser/gedcom_parser.dart' as gp;
class GedcomAdapter {
const GedcomAdapter();
List<PersonRecord> parse(String content) {
final parser = gp.GedcomParser();
final lines = content.split(RegExp(r'\r?\n'));
final data = parser.parseLines(lines);
final recordsById = <String, PersonRecord>{};
for (final entry in data.persons.entries) {
final p = entry.value;
recordsById[entry.key] = PersonRecord(
id: p.id,
givenName: p.firstName,
surname: p.lastName,
sex: p.sex,
birthDate: p.birthDate,
birthPlace: p.birthPlace,
deathDate: p.deathDate,
deathPlace: p.deathPlace,
);
}
final spouseNamesById = <String, Set<String>>{};
for (final family in data.families.values) {
_link(spouseNamesById, recordsById, family.husbandId, family.wifeId);
_link(spouseNamesById, recordsById, family.wifeId, family.husbandId);
}
return recordsById.values.map((record) {
final spouseNames = spouseNamesById[record.id];
return PersonRecord(
id: record.id,
givenName: record.givenName,
surname: record.surname,
sex: record.sex,
birthDate: record.birthDate,
birthPlace: record.birthPlace,
deathDate: record.deathDate,
deathPlace: record.deathPlace,
spouseName: spouseNames == null || spouseNames.isEmpty
? null
: (spouseNames.toList()..sort()).join(', '),
);
}).toList(growable: false);
}
void _link(
Map<String, Set<String>> spouseNamesById,
Map<String, PersonRecord> recordsById,
String? personId,
String? spouseId,
) {
if (personId == null || spouseId == null) return;
final spouse = recordsById[spouseId];
if (spouse == null || spouse.fullName.isEmpty) return;
spouseNamesById.putIfAbsent(personId, () => <String>{}).add(spouse.fullName);
}
}
Why this adapter matters:
- Easy to swap parser implementation later.
- Keeps matching logic stable even if parser package changes.
7. Add text normalization
Genealogy data is noisy. Normalize names and places before scoring.
class Normalizer {
const Normalizer();
String normalizeText(String? input) {
if (input == null) return '';
final lower = input.toLowerCase().trim();
final noPunct = lower.replaceAll(RegExp(r'[^a-z0-9\s]'), ' ');
return noPunct.replaceAll(RegExp(r'\s+'), ' ').trim();
}
String normalizeDate(String? input) {
if (input == null) return '';
return normalizeText(input);
}
}
You can later improve this with diacritic removal, nickname maps, and phonetic algorithms.
8. Implement matching engine
A practical weighted matcher:
- Computes per-field signal.
- Applies configurable weights.
- Produces reasons for explainability.
- Filters by min confidence.
class GedcomMatcher {
const GedcomMatcher({this.normalizer = const Normalizer()});
final Normalizer normalizer;
List<MatchResult> match({
required List<PersonRecord> leftPeople,
required List<PersonRecord> rightPeople,
MatchOptions options = const MatchOptions(),
}) {
final results = <MatchResult>[];
for (final left in leftPeople) {
final candidates = options.maxCandidates == null
? rightPeople
: rightPeople.take(options.maxCandidates!).toList(growable: false);
for (final right in candidates) {
final scored = _score(left, right, options);
if (scored.confidence >= options.minConfidence) {
results.add(scored);
}
}
}
results.sort((a, b) => b.confidence.compareTo(a.confidence));
return results;
}
MatchResult _score(
PersonRecord a,
PersonRecord b,
MatchOptions o,
) {
var maxWeight = 0;
var points = 0;
final reasons = <String>[];
void addSignal({
required int weight,
required bool matches,
required String reason,
}) {
maxWeight += weight;
if (matches) {
points += weight;
reasons.add(reason);
}
}
final aName = normalizer.normalizeText(a.fullName);
final bName = normalizer.normalizeText(b.fullName);
addSignal(
weight: o.weightName,
matches: aName.isNotEmpty && aName == bName,
reason: 'Name matches',
);
addSignal(
weight: o.weightBirthDate,
matches: normalizer.normalizeDate(a.birthDate) ==
normalizer.normalizeDate(b.birthDate) &&
normalizer.normalizeDate(a.birthDate).isNotEmpty,
reason: 'Birth date matches',
);
addSignal(
weight: o.weightBirthPlace,
matches: normalizer.normalizeText(a.birthPlace) ==
normalizer.normalizeText(b.birthPlace) &&
normalizer.normalizeText(a.birthPlace).isNotEmpty,
reason: 'Birth place matches',
);
addSignal(
weight: o.weightDeathDate,
matches: normalizer.normalizeDate(a.deathDate) ==
normalizer.normalizeDate(b.deathDate) &&
normalizer.normalizeDate(a.deathDate).isNotEmpty,
reason: 'Death date matches',
);
addSignal(
weight: o.weightSex,
matches: normalizer.normalizeText(a.sex) == normalizer.normalizeText(b.sex) &&
normalizer.normalizeText(a.sex).isNotEmpty,
reason: 'Sex matches',
);
addSignal(
weight: o.weightSpouse,
matches: normalizer.normalizeText(a.spouseName) ==
normalizer.normalizeText(b.spouseName) &&
normalizer.normalizeText(a.spouseName).isNotEmpty,
reason: 'Spouse matches',
);
final confidence = maxWeight == 0 ? 0 : ((points * 100) / maxWeight).round();
return MatchResult(
left: a,
right: b,
confidence: confidence,
reasons: reasons,
);
}
}
9. Implement output formatter
You want one source of truth for rendering:
- table for humans.
- json/csv/markdown for data export and pipelines.
import 'dart:convert';
class MatchOutputFormatter {
const MatchOutputFormatter();
String format(List<MatchResult> matches, OutputFormat format) {
switch (format) {
case OutputFormat.table:
return _table(matches);
case OutputFormat.json:
return _json(matches);
case OutputFormat.csv:
return _csv(matches);
case OutputFormat.markdown:
return _markdown(matches);
}
}
String _table(List<MatchResult> matches) {
final buffer = StringBuffer();
buffer.writeln('Confidence | Left Person | Right Person | Reasons');
buffer.writeln('---------- | ----------- | ------------ | -------');
for (final m in matches) {
buffer.writeln(
'${m.confidence.toString().padLeft(3)}% | '
'${m.left.fullName} | '
'${m.right.fullName} | '
'${m.reasons.join('; ')}',
);
}
return buffer.toString();
}
String _json(List<MatchResult> matches) {
final jsonList = matches.map((m) {
return {
'confidence': m.confidence,
'left': {'id': m.left.id, 'name': m.left.fullName},
'right': {'id': m.right.id, 'name': m.right.fullName},
'reasons': m.reasons,
};
}).toList(growable: false);
return const JsonEncoder.withIndent(' ').convert(jsonList);
}
String _csv(List<MatchResult> matches) {
final rows = <String>[
'confidence,left_id,left_name,right_id,right_name,reasons',
];
for (final m in matches) {
rows.add(
'${m.confidence},'
'${_escape(m.left.id)},'
'${_escape(m.left.fullName)},'
'${_escape(m.right.id)},'
'${_escape(m.right.fullName)},'
'${_escape(m.reasons.join('|'))}',
);
}
return rows.join('\n');
}
String _markdown(List<MatchResult> matches) {
final rows = <String>[
'| Confidence | Left | Right | Reasons |',
'|---:|---|---|---|',
];
for (final m in matches) {
rows.add(
'| ${m.confidence}% | '
'${m.left.fullName} | '
'${m.right.fullName} | '
'${m.reasons.join(', ')} |',
);
}
return rows.join('\n');
}
String _escape(String value) {
final q = value.replaceAll('"', '""');
return '"$q"';
}
}
10. Build CLI argument parsing with args
Example CLI entry logic:
- Parse options.
- Validate required positional args.
- Read input files.
- Parse and match.
- Render/export.
import 'dart:io';
import 'package:args/args.dart';
void main(List<String> arguments) {
final parser = ArgParser()
..addMultiOption(
'format',
abbr: 'f',
allowed: ['table', 'json', 'csv', 'markdown'],
defaultsTo: ['table'],
)
..addOption('output', abbr: 'o')
..addOption('min-confidence', defaultsTo: '0')
..addOption('max-candidates')
..addFlag('no-color', negatable: false)
..addFlag('help', abbr: 'h', negatable: false);
late ArgResults args;
try {
args = parser.parse(arguments);
} catch (e) {
stderr.writeln('Argument error: $e');
stderr.writeln(parser.usage);
exitCode = 64;
return;
}
if (args['help'] == true) {
stdout.writeln('GEDCOM matcher-like CLI');
stdout.writeln(parser.usage);
return;
}
if (args.rest.length != 2) {
stderr.writeln('Expected two GEDCOM files: left right');
stderr.writeln(parser.usage);
exitCode = 64;
return;
}
}
11. Export public API cleanly
Expose only what users need:
- Adapter.
- Matcher.
- Models.
- Formatter.
Keep internals private and avoid exposing implementation details.
12. Add deterministic tests
Use fixtures and focused unit tests:
- Parser adapter test with minimal GEDCOM input.
- Normalizer test.
- Matching test with known confidence.
- Formatter snapshots for each output format.
- CLI behavior tests (help, bad args, happy path).
Example matcher test:
import 'package:test/test.dart';
void main() {
test('exact name and birth date yield strong score', () {
const matcher = GedcomMatcher();
const left = PersonRecord(
id: 'A1',
givenName: 'John',
surname: 'Martin',
sex: 'M',
birthDate: '12 JAN 1900',
birthPlace: 'Lyon',
);
const right = PersonRecord(
id: 'B1',
givenName: 'John',
surname: 'Martin',
sex: 'M',
birthDate: '12 JAN 1900',
birthPlace: 'Lyon',
);
final results = matcher.match(
leftPeople: const [left],
rightPeople: const [right],
options: const MatchOptions(minConfidence: 1),
);
expect(results, hasLength(1));
expect(results.first.confidence, greaterThanOrEqualTo(70));
});
}
Run quality gates:
dart format . dart analyze dart test
Source: Dart testing tutorial section
13. Run and use your CLI
Typical commands:
dart run --help dart run path/to/a.ged path/to/b.ged dart run --format json --output matches.json path/to/a.ged path/to/b.ged dart run --format table --format markdown --min-confidence 70 path/to/a.ged path/to/b.ged
Key UX tips:
- Return process exit codes.
- Print actionable argument errors.
- Keep default output human-readable.
- Make structured outputs machine-friendly.
14. Hardening for real-world genealogy data
Add these improvements progressively:
- Candidate blocking: use surname initials or birth year buckets to avoid full cartesian comparisons.
- Fuzzy scoring: add partial scores for near names and near dates.
- Explainability: keep detailed score breakdown to justify each match.
- Performance: stream large files where possible and avoid repeated normalization.
- Safety: validate that all user numeric inputs are in expected ranges.
15. Packaging and publishing
Before publishing:
- Bump version.
- Update changelog.
- Run format, analyze, test.
- Dry run publish.
- Publish.
dart pub publish --dry-run dart pub publish
References:
16. Sources and hyperlinks
Primary documentation:
- https://dart.dev/learn/tutorial
- https://dart.dev/tools/dart-run
- https://pub.dev/packages/args
- https://pub.dev/packages/test
- https://pub.dev/packages/lints
- https://pub.dev/packages/gedcom_parser
Project-specific references used:
17. Promotional closing: gedcom_matcher and gecdom_parser
If your goal is to move fast without reinventing everything, your current stack is already very strong.
gedcom_matcher:
- It is a practical, configurable CLI and Dart library for confidence-based GEDCOM person matching.
- It already supports weighted heuristics, threshold filtering, multiple output formats, and export workflows.
- It is suitable both for command-line usage and integration in larger Dart applications.
About gecdom_parser:
- The package name on pub.dev is gedcom_parser.
- It provides robust GEDCOM parsing capabilities and is exactly the kind of reliable foundation you want for a matching tool.
- In your project, this integration is cleanly handled in lib/src/gedcom_parser.dart, which is a great architecture choice because it isolates parser dependency from your domain logic.