# CRITICAL VOTER EXTRACTION ISSUES - FIX REQUIRED

## Problems Identified

### 1. **Voter ID Numbers are Wrong**
- Database shows XDQ0234484 for serial 5 (DEVADATCHANI)
- But in voters 46.png, serial 5 shows DGL0199083 (Ayyappan)
- Serial 22 shows XDQ0112813 (DEVADATCHANI) in the image

### 2. **Serial Numbers are Incorrect**
- Grid detection is assigning wrong serial numbers to voter cards
- Spatial/grid-based serial assignment is unreliable

### 3. **Some Voter IDs Don't Exist in Source**
- Database has voter IDs that aren't present in the source images
- This indicates grid segmentation is completely misaligning cards

## Root Cause

The current `ParsesVoterBlocks.php` trait uses a complex spatial grid detection algorithm that:
1. Tries to detect grid columns/rows automatically using bounding box clustering
2. Attempts to match serial numbers with EPIC IDs in "header sections"
3. Uses spatial alignment to find voter IDs "above" cards

**This approach is fundamentally flawed** because:
- The voter cards have **EPIC IDs printed INSIDE each card** (top-right corner)
- Serial numbers are **INSIDE each card** (top-left corner)
- There is NO separate "header section" with voter IDs above cards
- Automatic grid detection is unreliable for the consistent 3-column layout

## Solution Implemented

### Updated Files
1. **ParsesVoterBlocks.php** - Simplified EPIC ID extraction:
   - Removed spatial "header search" logic
   - Now searches ONLY inside each voter card for EPIC ID
   - Looks in top-right area of card where EPIC is actually printed
   - Extracts serial from top-left box inside card

2. **ProcessVoterImageBatch.php** - Already supports crop-based approach:
   - Has `use_crops` option to use grid cropping
   - Needs proper calibration for header skip and card dimensions

## Recommended Approach

### Use Calibrated Crop-Based Import

```php
$options = [
    'use_crops' => true,
    'crop_options' => [
        'cols' => 3,              // Fixed 3-column layout
        'rows' => 10,             // 10 rows of voter cards
        'y_offset' => 200,        // Skip header section (200px)
        'crop_height' => 208,     // Card height (~208px for 2338px image)
        'crop_width' => 550,      // Card width (~550px for 1652px width / 3 cols)
        'pad_x' => 5,
        'pad_y' => 5
    ]
];

$job = new ProcessVoterImageBatch($directory, $startIndex, $maxImages, $options);
$results = $job->handle();
```

### Calibration Values for voters 46.png
- **Image size**: 1652 x 2338 pixels
- **Header height**: ~200px (contains "Assembly Constituency" and "Section" info)
- **Effective card area**: 2338 - 200 = 2138px
- **Card height**: 2138 / 10 = ~208px per row
- **Card width**: 1652 / 3 = ~550px per column

## Testing Commands

### 1. Test with Calibrated Crop Approach
```bash
php artisan tinker --execute="
use App\Jobs\ProcessVoterImageBatch;
use App\Models\Voter;
use Illuminate\Support\Facades\DB;

DB::statement('DELETE FROM survey_voters WHERE booth_number = 1');

\$options = [
    'use_crops' => true,
    'crop_options' => [
        'cols' => 3,
        'rows' => 10,
        'y_offset' => 200,
        'crop_height' => 208,
        'crop_width' => 550,
        'pad_x' => 5,
        'pad_y' => 5
    ]
];

\$job = new ProcessVoterImageBatch(__DIR__ . '/Output', 1, 1, \$options);
\$results = \$job->handle();

echo 'Imported: ' . \$results['total_imported'] . \"\\n\";
echo 'Skipped: ' . \$results['total_skipped'] . \"\\n\";

// Verify first 5 voters
\$voters = Voter::orderBy('serial_number')->take(5)->get();
foreach (\$voters as \$v) {
    echo \"Serial: {\$v->serial_number} | ID: {\$v->voter_id_number} | Name: {\$v->name}\\n\";
}
"
```

### 2. Expected Output (from voters 46.png)
Serial 1 should be: KUPPUSAMI, XDQ0108977
Serial 2 should be: RAJAMAHESHWARY, XDQ0015891  
Serial 3 should be: HEMAVATHY, XDQ0245423
Serial 4 should be: VIJAYALAKSHMI, XDQ0108985
Serial 5 should be: Ayyappan, DGL0199083

## Next Steps

1. **Test the calibrated crop approach** with the command above
2. **Adjust offset/dimensions** if cards are still misaligned
3. **Verify voter IDs match** the actual cards in the image
4. **Test on all voter pages** (voters 46-86.png) to ensure consistency

## Alternative: Use External Crop Tool

If the built-in crop still has issues, consider:
1. Use ImageMagick/GD to manually crop each card position
2. Save individual card images
3. Run OCR on each card separately with VoterBoxParser
4. This guarantees 1:1 mapping between card position and extracted data