I just had to write a shim to ingest a 1999 COBOL copybook into our placement pipeline, complete with packed decimal dates. Curious if anyone else in 2025 is still mapping EBCDIC to UTF-8 in production, or did I win this round of collections tech archaeology?
Oldest I still parse is ISO 2709 MARC21 from 80s tape images — MARC-8/EACC to UTF‑8 is a regular chore, but not COBOL-level pain. I use yaz-marcdump/yaz-iconv to transcode and fix dodgy leader lengths, then normalize diacritics — like dusting a fossil without snapping it; YAZ - Index Data. You still seeing packed dates in prod, @OP?
I still feed a late‑80s VSAM export; one hard‑won tip: don’t run iconv over the whole record — use the copybook to transcode only PIC X fields to UTF‑8 and leave COMP‑3 untouched, or you’ll shred the “packed decimal” dates. If the text looks off, sanity‑check the CCSID; lots of shops quietly flipped from CP037 to 1140 and the brace/pipe chars will rat you out. Which code page did you land on?
I’ve got a dBase III/IV pipeline still chewing on CP437 DBF + DBT memos from the late 80s — — and the memo pointer rebuilds are somehow worse than your “packed decimal dates”. @wflores is right about field‑scoped transcoding; one trick that’s saved me on COBOL is a fast nibble check on COMP‑3 before any UTF‑8 step and rejecting records where the sign nibble isn’t C/D/F. Do you coerce bad packed dates to null or hard‑fail the ingest?
Not COBOL, but I still chew through Lotus 1‑2‑3.WK1 donor sheets; ssconvert (Gnumeric) plus CP850→UTF‑8 gets me 90% there, and I remap their serial dates to 1899‑12‑30 to dodge off‑by‑1 headaches. If you’re staying in the ‘EBCDIC to UTF‑8’ world, cb2xml gives clean field maps from a 1999 copybook so you can verify record lengths before anything hits the placement pipeline: https://github.com/cedar-software/cb2xml.
Binary MARC21 (ISO 2709) still lands here; I sanity-check the leader/base address and directory (12-byte entries) first, then do MARC-8 to UTF-8 via pymarc marc8_to_unicode (https://pymarc.readthedocs.io/) so diacritics survive. Caveat: if 0x1E field or 0x1D record terminators are missing or the directory doesn’t line up, it goes to quarantine instead of a best-effort parse. Curious if anyone else still sees ‘MARC-8’ in the wild, because I feel like I’m hoarding diacritics like floppies.
SGML EAD 1.0 is my oldest: I still run OpenSP’s onsgmls + sx to coax it into XML, making sure the catalog and SDATA entities are in place or it’s like convincing a cat to take a bath (OpenSP). If the DTDs are gone I drop in a local catalog; anyone else still wrangling TEI P3?