feat: enhance README and index.ts for large file processing and output management

This commit is contained in:
Victor Noguera
2026-04-11 16:57:27 -04:00
parent 0162e54007
commit 4386560964
2 changed files with 143 additions and 84 deletions

View File

@@ -31,6 +31,12 @@ bun run src/index.ts leads.xlsx
bun run src/index.ts leads.csv --out results.xlsx
```
Large-file behavior:
- If the input has more than 50 products, processing is done in chunks of 50.
- Each chunk is analyzed and written to a numbered output file, for example: `results_part_001.xlsx`, `results_part_002.xlsx`, ...
- If `--out` is omitted for large files, the base output name defaults to `<input>_results.xlsx` and chunk files are still written with numbered suffixes.
Quick SP-API connectivity tests:
```bash
@@ -90,8 +96,9 @@ Numeric parsing accepts plain numbers as well as formatted values like `$12.50`,
3. **Sellability gate** — check all uncached ASINs against SP-API `getListingsRestrictions` (concurrency: 5 workers); immediately skip ASINs with status `not_available` and `canSell=false` (no Keepa/fees wasted)
4. **Keepa fetch** — batch the sellable (uncached) ASINs in a single API call (up to 100 per request)
5. **Enrich** — fetch SP-API pricing + FBA/FBM fees for sellable ASINs; combine with Keepa data and spreadsheet data
6. **LLM analysis** — send batches of 5 sellable products to LM Studio for FBA/FBM/SKIP verdict; skipped ASINs get auto-SKIP verdict (confidence 100) and bypass LLM entirely
7. **Output** — print results table to console (includes all ASINs), optionally write CSV/XLSX
6. **LLM analysis** — send batches of 5 available products to LM Studio for FBA/FBM/SKIP verdict
7. **Chunk orchestration** — if input size is greater than 50, run phases 2-6 for each 50-item chunk sequentially
8. **Output** — print results table to console (includes all ASINs); for chunked runs, always write seriated chunk files (`*_part_001`, `*_part_002`, ...); for non-chunked runs, write a single file only when `--out` is provided
## Output columns
@@ -119,11 +126,10 @@ ASIN, Name, Brand, Category, Unit Cost, Current Price, Avg Price 90d, Sales Rank
## Notes
- **Sellability-first optimization**: SP-API `getListingsRestrictions` is checked first to filter out unsellable items before consuming Keepa tokens or running full SP-API pricing/fees queries. This saves API calls and reduces runtime for large lead lists.
- **Available-only processing**: SP-API `getListingsRestrictions` is checked first and only ASINs with `sellabilityStatus=available` are enriched, analyzed, and included in outputs. Restricted, not_available, and unknown items are excluded.
- **SP-API concurrency**: `fetchSellabilityBatch` limits concurrent requests to 5 workers to avoid 429 throttling. Pricing+fees fetches also use 5 concurrent workers.
- **No batch endpoint**: Amazon SP-API does not provide batch endpoints for `getListingsRestrictions` or `getMyFeesEstimate*`. Concurrency limiting with the library's built-in `auto_request_throttled` safety net prevents overwhelming the API.
- **Keepa rate limiting**: The client reads `tokensLeft` and `refillRate` from each API response and waits automatically when tokens are exhausted. With a Pro subscription (1 token/min), all 100 ASINs in a batch cost 1 token.
- **Redis is optional**: If Redis is unavailable the tool runs without caching — every run re-fetches from Keepa.
- **SP-API**: `src/sp-api.ts` provides `fetchSellability`, `fetchSellabilityBatch`, and `fetchSpApiPricingAndFees` functions. If SP-API credentials are missing or a call fails, the tool falls back to conservative fee defaults and keeps processing.
- **Skipped ASINs**: Products with `not_available` sellability status and `canSell=false` appear in output with verdict `SKIP`, confidence 100, and reasoning from the sellability check. They do not consume LLM inference.
- **Sandbox vs production**: When `SP_API_USE_SANDBOX=true`, production ASIN calls can be denied. Use sandbox-compatible test data or set it to `false` for live marketplace connectivity.