消費税経理処理パターン抽出・ビューア開発ログ - スキーマ設計からID連番チェックまで

消費税経理処理パターン抽出・ビューア開発ログ

消費税関連の税務マニュアルのPDF画像（約250ページ、886事例）から経理処理パターンを抽出し、TypeScriptデータ化してビューアで閲覧できるようにした作業の記録。

1. スキーマ設計

設計方針

元のマニュアルの構造を忠実に再現
TypeScriptの型安全性を活用
検索・フィルタリングしやすい構造

メインスキーマ（consumption-tax-pattern-schema.ts）

// internal/data/consumption-tax-pattern-schema.ts

/** 勘定科目カテゴリ */
export interface AccountCategory {
  id: number;              // 1, 2, 14, 22, 60など
  name: string;            // "売上高", "広告宣伝費"など
  type: "revenue" | "expense";
}

/** サブセクション種別 */
export type SubSectionType =
  | "経理処理例"                    // (1) 共通
  | "簡易課税制度の事業区分"         // (2) 収益側
  | "個別対応方式の用途区分";        // (2) 費用側

/** 簡易課税の事業区分 */
export type SimplifiedTaxBusinessType =
  | "第一種"   // 卸売業
  | "第二種"   // 小売業
  | "第三種"   // 製造業等
  | "第四種"   // その他
  | "第五種"   // サービス業等
  | "第六種";  // 不動産業

/** 税区分（基本） */
export type TaxCategoryBase =
  | "課税売上"
  | "課税仕入"
  | "課税売上返還"
  | "課税仕入返還"
  | "輸出免税売上"
  | "非課税売上"
  | "不課税売上"
  | "対象外";

/** 用途区分（個別対応方式） */
export type UsageCategory =
  | "課税売上対応"
  | "非課税売上対応"
  | "共通対応";

/** 税率 */
export type TaxRate = "10%" | "8%軽減";

/** 控除率（経過措置） */
export type DeductionRate = "80%控除" | "50%控除";

/** 仕訳の片側（借方または貸方） */
export interface JournalSide {
  accountTitle: string;
  subAccount?: string;
  taxCategoryBase: TaxCategoryBase;
  taxRate?: TaxRate;
  simplifiedBusinessType?: SimplifiedTaxBusinessType;
  usageCategory?: UsageCategory;
  deductionRate?: DeductionRate;
  amount: number;
}

/** 仕訳行 */
export interface JournalEntryLine {
  debit: JournalSide;
  credit: JournalSide;
  description?: string;
}

/** 仕訳パターン */
export interface JournalPattern {
  condition?: string;  // "インボイスを保存している"等
  entries: JournalEntryLine[];
}

/** 経理処理例（メインエンティティ） */
export interface AccountingExample {
  id: string;                // "001"〜"886"
  categoryId: number;
  categoryName: string;
  subSectionType: SubSectionType;
  problem: string;           // 問題文
  patterns: JournalPattern[];
  notes?: string[];
  sourcePageNumber: number;
}

スキーマの構造

勘定科目タイプ	(1)	(2)
収益側（売上高等）	経理処理例	簡易課税制度の事業区分
費用側（仕入高、経費等）	経理処理例	個別対応方式の用途区分

事例は001〜886番まで存在し、各事例は条件分岐（インボイス有無等）で複数パターンを持つ場合がある。

2. extract-tax-patterns スキル

スラッシュコマンドの使用方法

/extract-tax-patterns [開始ページ] [終了ページ]

例:

/extract-tax-patterns 184 203

サブエージェント並列処理（5枚ごと）

画像からのデータ抽出は処理負荷が高いため、5ページごとにサブエージェントを分割して並列実行する設計とした。

例: /extract-tax-patterns 184 203（20枚）の場合
- サブエージェント1: 184-188（5枚）
- サブエージェント2: 189-193（5枚）
- サブエージェント3: 194-198（5枚）
- サブエージェント4: 199-203（5枚）

各サブエージェントへのプロンプト:

消費税経理処理パターンの画像からデータを抽出してください。

## 対象ファイル
- ディレクトリ: internal/data/{マニュアル画像ディレクトリ}/
- ファイル名: {マニュアル名}_ページ_{番号}.jpg
- 処理範囲: ページ {開始} 〜 {終了}

## スキーマ
internal/data/consumption-tax-pattern-schema.ts を参照

## 出力
抽出したデータをTypeScript配列形式で出力
ファイル出力先: internal/data/extracted/pages_{開始}-{終了}.ts

ファイル構成

internal/data/
├── consumption-tax-pattern-schema.ts  # スキーマ定義
├── extracted/                          # サブエージェント出力
│   ├── pages_183-187.ts
│   ├── pages_188-192.ts
│   ├── pages_193-197.ts
│   └── ... (約50ファイル)
└── {マニュアル画像ディレクトリ}/  # 元画像
    ├── ページ_183.jpg
    └── ...

3. tax-patterns ビューアページ

ミラーカラムレイアウト（4カラム）

macOSのFinderのカラム表示にインスパイアされた階層ナビゲーション。

┌──────────────┬──────────────┬──────────┬────────────────────────┐
│ 勘定科目     │ 区分         │ ページ   │ 詳細                   │
│ (360px)      │ (240px)      │ (80px)   │ (1fr)                  │
├──────────────┼──────────────┼──────────┼────────────────────────┤
│ [1] 売上高   │ ・経理処理例 │ p.183    │ ◆ 問題文...            │
│ [2] 売上値引 │ ・簡易課税   │ p.184    │                        │
│ ...          │              │ p.185    │ [No.001]               │
│              │              │          │ ┌─借方─┬─貸方─┬─摘要─┐│
└──────────────┴──────────────┴──────────┴────────────────────────┘

CSSグリッド定義

.miller-columns {
  display: grid;
  grid-template-columns: 360px 240px 80px 1fr;
  flex: 1;
  overflow: hidden;
}

キーボードナビゲーション

矢印キーでページ間を移動できる:

function handleKeydown(e: KeyboardEvent) {
  const activeElement = document.activeElement;
  // input/textarea/selectにフォーカスがある場合は無視
  if (activeElement &&
      (activeElement.tagName === 'INPUT' ||
       activeElement.tagName === 'TEXTAREA' ||
       activeElement.tagName === 'SELECT')) {
    return;
  }
  if (e.key === 'ArrowLeft' || e.key === 'ArrowUp') {
    e.preventDefault();
    goPrev();
  } else if (e.key === 'ArrowRight' || e.key === 'ArrowDown') {
    e.preventDefault();
    goNext();
  }
}

ローカルストレージによる位置保存

ブラウザを閉じても閲覧位置を記憶:

const STORAGE_KEY = 'tax-patterns-position';

function savePosition() {
  if (selectedCategoryId.value && selectedSubSection.value && selectedPageNumber.value) {
    const position = {
      categoryId: selectedCategoryId.value,
      subSection: selectedSubSection.value,
      pageNumber: selectedPageNumber.value,
    };
    localStorage.setItem(STORAGE_KEY, JSON.stringify(position));
  }
}

function restorePosition(): boolean {
  const saved = localStorage.getItem(STORAGE_KEY);
  if (!saved) return false;

  try {
    const position = JSON.parse(saved);
    // 保存された位置が有効かチェック
    const categoryExists = categories.value.some(c => c.id === position.categoryId);
    if (!categoryExists) return false;
    // ... 復元処理
  } catch {
    return false;
  }
}

問題文でのグループ化表示

同じ問題文を持つ事例（インボイスあり/なし等）をグループ化して表示:

interface ProblemGroup {
  problem: string;
  examples: AccountingExample[];
  notes: string[];
}

const groupedByProblem = computed((): ProblemGroup[] => {
  const groups: ProblemGroup[] = [];
  const problemMap = new Map<string, ProblemGroup>();

  pageExamples.value.forEach(ex => {
    const existing = problemMap.get(ex.problem);
    if (existing) {
      existing.examples.push(ex);
      // notesをマージ（重複排除）
      if (ex.notes) {
        ex.notes.forEach(note => {
          if (!existing.notes.includes(note)) {
            existing.notes.push(note);
          }
        });
      }
    } else {
      const group: ProblemGroup = {
        problem: ex.problem,
        examples: [ex],
        notes: ex.notes ? [...ex.notes] : [],
      };
      problemMap.set(ex.problem, group);
      groups.push(group);
    }
  });

  return groups;
});

テーブル幅の固定（px指定で列揃え）

複数テーブル間で列幅を統一するため、パーセンテージではなくpx指定:

.journal-table .col-account {
  width: 99px;
}

.journal-table .col-sub {
  width: 88px;
}

.journal-table .col-tax {
  width: 121px;
}

.journal-table .col-amount {
  width: 110px;
}

.journal-table .col-desc {
  width: 300px;
}

テーブル幅の白い余白問題はOpenAI Codexにレビューを依頼し、以下の解決策を採用:

.detail-content {
  --pad-inline: 1rem;
  padding: 1rem 0 1rem 1.5rem; /* 右だけ0にする */
  overflow-y: auto;
  scrollbar-gutter: stable;
}

.detail-inner {
  padding-right: var(--pad-inline);
}

4. ID分割・連番化作業

発見された問題

元のデータ抽出で以下の問題が発生:

ID形式の問題: "16-1-323" のように論点番号が含まれている
- 正しい形式: "323"（仕訳番号のみ）
sourcePageNumberの問題: 仕訳番号が入っている
- 現状: sourcePageNumber: 397（仕訳番号）
- 正しい値: sourcePageNumber: 278（画像ファイル番号）
問題文の重複: インボイス条件が問題文とconditionで重複

修正内容

ID修正（22ファイル、約362件）

// Before
id: "18-1-397",

// After
id: "397",

sourcePageNumber修正

// Before（コメントのページ番号と不一致）
// ページ278（印刷ページ260）: パターン397-402
sourcePageNumber: 397,  // 仕訳番号が入っている

// After
sourcePageNumber: 278,  // 画像ファイル番号

計算式: 画像ファイル番号 = 印刷ページ番号 + 18

問題文の重複テキスト削除（115件）

削除対象パターン:

インボイスを保存している。
請求書等（インボイスに該当しない）を保存している。
当社は少額特例の対象となる中小事業者である。
（インボイスを保存している。）
（請求書等（インボイスに該当しない）を保存している。）

特殊ケース（そのまま残した10件）

以下は問題文にインボイス条件が含まれているが、構造が特殊なため修正しなかった:

Type 1: problemが条件文のみで構成

ID 784, 788: 【原則】請求書等（インボイスに該当しない）を保存している。【工事が完成した課税期間】

Type 2: 問題文途中に複雑にインボイス条件が含まれる

ID 854-861: 建物建設に係る材料費関連の複合パターン

5. ID連番チェックテストの作成

テストの目的

インボイス関連の仕訳で同じIDが重複していた問題の再発防止
データ抽出漏れ（欠番）の検出

テストコード

// apps/web/tests/check-sequential-ids.test.ts

import { describe, it, expect } from "vitest";
import * as fs from "fs";
import * as path from "path";

const DATA_DIR = path.resolve(__dirname, "../../../internal/data/extracted");

// 既知の欠番（元データの抽出が未完了）
const KNOWN_MISSING_IDS = new Set<number>([]);

describe("ID連番チェック", () => {
  it("すべてのIDが連番になっていること（重複・欠番なし）", async () => {
    const files = getDataFiles();
    const allExamples: IdInfo[] = [];

    for (const file of files) {
      const module = await import(file);
      // モジュール内のすべての配列エクスポートを検索
      const examples: any[] = [];
      for (const key of Object.keys(module)) {
        if (Array.isArray(module[key])) {
          examples.push(...module[key]);
        }
      }
      // ... 収集処理
    }

    // 重複IDのチェック
    const duplicates = [...idCounts.entries()]
      .filter(([_, infos]) => infos.length > 1);

    // 欠番のチェック
    const ids = sortedExamples.map((e) => parseInt(e.id));
    const minId = Math.min(...ids);
    const maxId = Math.max(...ids);
    const missingIds: number[] = [];

    for (let i = minId; i <= maxId; i++) {
      if (!ids.includes(i)) {
        missingIds.push(i);
      }
    }

    expect(duplicates.length).toBe(0);
    expect(newMissing.length).toBe(0);
  });

  it("重複IDの詳細チェック（インボイス関連の同一ID問題）", async () => {
    // 同一IDで異なるconditionを持つエントリを検出
    const problematicIds = [];

    for (const [id, entries] of idToEntries) {
      if (entries.length > 1) {
        const conditions = entries.map((e) => e.condition).filter(Boolean);
        const uniqueConditions = new Set(conditions);
        if (uniqueConditions.size > 1) {
          problematicIds.push({ id, entries });
        }
      }
    }

    expect(problematicIds.length).toBe(0);
  });
});

テスト結果サマリー出力例

========================================
サマリー:
========================================
総エントリ数: 931
ID範囲: 1 - 931
重複: 0件
欠番: 0件

6. 遭遇したエラーと解決

database is locked エラー

症状: pnpm dev 実行時に @nuxt/content のSQLiteデータベースがロックされる

原因: 前回のdevサーバーが正しく終了せず、SQLiteファイルのロックが残存

解決策:

rm -rf apps/web/.nuxt apps/web/.data

まとめ

886事例のデータ抽出を完了（約50ファイル）
ミラーカラムレイアウトのビューアで直感的に閲覧可能
キーボードナビゲーション、位置保存で快適な操作性
ID連番チェックテストでデータ品質を担保
問題文でのグループ化により、インボイス有無パターンの比較が容易に

今後の展望:

項目別索引（索引2）の実装
全文検索機能の追加
モバイル対応の改善