feat: add UPC to ASIN mapping and large file UPC analysis

Introduces the capability to resolve UPCs to ASINs using the Keepa API. This includes a new `upc-file` command for processing large Excel files of UPCs, a `upc` CLI tool for quick lookups, and API endpoints for web-based integration. The analysis pipeline was refactored into a reusable module to support both standard ASIN leads and new UPC-driven workflows.
This commit is contained in:
Victor Noguera
2026-04-16 23:06:55 -04:00
parent d25cf5d5ec
commit 32e7b0c485
14 changed files with 2278 additions and 250 deletions

View File

@@ -45,6 +45,95 @@ bun run src/sp-test.ts B07SN9BHVV # Auth + sellers endpoint + pricing offer c
bun run src/sp-test.ts --sellability B07SN9BHVV # Standalone sellability check
```
## UPC to ASIN Mapping
You can map UPCs to ASINs directly through the Keepa integration in `src/keepa.ts`.
```ts
import { mapUpcsToAsins, lookupKeepaUpcs } from "./src/keepa.ts";
const upcs = ["012345678901", "098765432109", "112233445566"];
// Simple map output (UPC -> ASIN) for clean one-to-one matches only.
const asinMap = await mapUpcsToAsins(upcs);
for (const [upc, asin] of asinMap.entries()) {
console.log(`UPC ${upc} -> ASIN ${asin}`);
}
// Rich output includes status for every UPC (invalid, not found, collisions, etc.).
const details = await lookupKeepaUpcs(upcs);
for (const [upc, detail] of details.entries()) {
console.log(upc, detail.status, detail.asin, detail.reason ?? "");
}
```
Behavior:
- Strict validation accepts only 12, 13, or 14 digit UPC values.
- If a UPC resolves to multiple ASINs, it is excluded from the simple map.
- The rich lookup returns all candidate ASINs and status per UPC.
CLI usage:
```bash
bun run upc 012345678901 098765432109
bun run upc 012345678901,098765432109 --detailed
bun run upc --file upcs.txt --detailed --json
```
API usage (when `bun run start:web` is running):
```bash
# Simple one-to-one mapping (GET)
curl "http://localhost:3000/api/upc/map?upc=012345678901&upc=098765432109"
# Detailed lookup with statuses (GET)
curl "http://localhost:3000/api/upc/lookup?upcs=012345678901,098765432109"
# Detailed lookup (POST JSON)
curl -X POST "http://localhost:3000/api/upc/lookup" \
-H "content-type: application/json" \
-d '{"upcs":["012345678901","098765432109"]}'
```
## Large UPC File Analysis (XLS/XLSX)
For very large Excel files that contain UPC values, use the dedicated UPC-file process. It runs in batches:
1. Reads UPC rows in batches (`.xlsx` uses streaming reader, `.xls` uses fallback row-window parsing).
2. Resolves UPCs to ASINs with Keepa.
3. Runs the same sellability + Keepa/SP-API enrichment + LLM verdict pipeline as lead analysis.
4. Persists output into existing `runs` + `results` tables, so it appears in current reporting APIs/UI.
CLI usage:
```bash
bun run upc-file --input huge-upcs.xlsx
bun run upc-file --input huge-upcs.xls --input-batch-size 500 --upc-lookup-batch-size 100 --max-rows 10000
```
API usage (when `bun run start:web` is running):
```bash
curl -X POST "http://localhost:3000/api/process/upc-file" \
-H "content-type: application/json" \
-d '{
"inputFile": "/absolute/path/to/huge-upcs.xlsx",
"inputBatchSize": 300,
"upcLookupBatchSize": 100
}'
```
Request body fields:
- `inputFile` (required): server-local path to `.xls` or `.xlsx` file.
- `outputFile` (optional): stored in run metadata.
- `inputBatchSize` (optional): number of input rows per processing batch (default `200`).
- `upcLookupBatchSize` (optional): UPC chunk size per Keepa lookup call (default `100`).
- `maxRows` (optional): cap processed valid UPC rows for dry runs.
Response includes run metadata and status counts, including unresolved UPC reasons and lead verdict totals.
## Input file format
Accepts `.csv` or `.xlsx` files. Column names are matched case-insensitively. Required column: