[HN Gopher] Show HN: COBOL-REKT, a toolkit for analysing and rev...
___________________________________________________________________
Show HN: COBOL-REKT, a toolkit for analysing and reverse-
engineering COBOL
This is an evolving toolkit of capabilities helpful for analysing
and reverse engineering legacy Cobol code. Currently, the following
capabilities are available: - Program / Section-level flowchart
generation based on AST (SVG or PNG) - Parse Tree generation (with
export to JSON) - Control Flow Tree generation (with export to
JSON) - Allows embedding code comments as comment nodes in the
graph - The SMOJOL Interpreter (WIP) - Injecting AST and Control
Flow into Neo4J - Injecting Cobol data layouts from Data Division
into Neo4J (with dependencies like MOVE, COMPUTE, etc.) + export to
JSON - Injecting execution traces from the SMOJOL interpreter into
Neo4J - Integration with OpenAI GPT to summarise nodes using
bottom-up node traversal (AST nodes or Data Structure nodes) -
Exposes a unified model (AST, CFG, Data Structures with appropriate
interconnections) which can be analysed through
[JGraphT](https://jgrapht.org/), together with export to GraphML
format and JSON. - Support for namespaces to allow unique
addressing of (possibly same) graphs - ALPHA: Support for building
Glossary of Variables from data structures using LLMs - ALPHA:
Support for extracting Capability Graph from paragraphs of a
program using LLMs - ALPHA: Injecting inter-program dependencies
into Neo4J (with export to JSON) - ALPHA: Paragraph similarity map
Contributions / use cases are welcome!
Author : armorer
Score : 85 points
Date : 2024-08-15 10:04 UTC (12 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| robin_reala wrote:
| So obviously there's been a lot of legacy COBOL kicking around,
| but is this still the case? Would a new COBOL project have been
| started in the last 20 years? I kind of imagined that Java (or at
| least the JVM) has eaten its lunch.
| slowmotiony wrote:
| I worked for two very big banks in europe and we have lots of
| cobol batch jobs that were written like 30 or 40 years ago.
| jmclnx wrote:
| How many objects without source did you find there ?
|
| In the 80s (and probably before), every system I worked on
| had at least one critical COBOL program missing source code.
|
| I am wondering if you noticed the same with this project.
| b800h wrote:
| Missing source code? That's terrifying.
| alex_suzuki wrote:
| Par for the course.
| derriz wrote:
| I did some contracting for a retail bank a good while back
| and the COBOL source for the term deposit interest
| calculation routine had been lost a few decades earlier but
| was still in use. I suggested a rewrite but there was no
| enthusiasm/support for it - especially as my prototype
| could not exactly reproduce the rounding. Nobody wanted to
| have to deal with customer communications to explain why
| one cent of interest was being paid a month earlier or
| later than before.
| jmclnx wrote:
| > suggested a rewrite but there was no enthusiasm/support
| for it
|
| Been there :)
|
| Yes, you always get a "no" for doing that.
| SonOfLilit wrote:
| Ahaha I'm also working with a bank and "customer
| communications" is a really nice euphemism for "barrage
| of class action suits". I'm sitting here in the office
| laughing out loud.
| slowmotiony wrote:
| Totally normal, happened all the time :))
| armorer wrote:
| Yes, there's a lot of it kicking around. This evolved from a
| personal testbed to iterate on ideas to apply on actual legacy
| code modernisation work I've been involved in.
| Muromec wrote:
| So, is the color of your bank's logo blue, green, red or
| orange?
| nazgulsenpai wrote:
| My (non-bank) employer still has lots of COBOL in production,
| and its constantly extended. Before working here I expected all
| COBOL to be running on some large IBM mainframe, but no. It's
| x86-64 Windows -- COBOL compiled to native Windows binaries.
|
| Edited to specify not a bank.
| Suppafly wrote:
| Is there some advantage to using COBOL or did they just have
| an old COBOL programmer who just kept using it?
| bigbuppo wrote:
| Every single attempt to replace COBOL has ended in failure.
| COBOL will never die. You can move it to new platforms, but
| it can never be replaced.
| CalumSult wrote:
| As someone doing support and occasional code changes for a
| pile of vb6 that doesn't sound that bad. If you need a code
| base to be stable for decades COBOL beats vb6.
| karlmdavis wrote:
| A number of US federal agencies still have astonishing amounts
| of it. The world's largest insurer, Medicare, uses 10M+ lines
| of COBOL to process the claims it receives -- total dollar
| amounts that make up 3% of the yearly GDP.
|
| Maintaining and modernizing these critical systems is important
| work.
| uselpa wrote:
| Our shop (a bank) still develops in COBOL on a daily basis (IBM
| mainframe) and has a stock of tenth of millions of code lines.
| ianmcgowan wrote:
| Converting to Lithp any day now...
| chasil wrote:
| Absolutely, we have several full time people who specialize in
| COBOL variants for VMS and OS2200.
| Muromec wrote:
| Places that have cobol are the same places that have a bunch or
| laws being your list of requirement and nobody stopped writing
| laws in those 20 years. Those are also the places where risk is
| supposed to be in a different department.
| SonOfLilit wrote:
| There are probably tens of thousands of active COBOL developers
| maintaining systems written no later than I guess the nineties.
| rodgerd wrote:
| Yes?
|
| Any "lets move this COBOL to something else" will, more often
| than not, flounder on the fact that "rewrite all this
| functionality" will cost massively more than "just find someone
| to extend what we already have".
| pmarreck wrote:
| Here's a crazy idea (and possibly a job opportunity for someone?)
|
| If someone built a tool to translate the AST generated by this
| into one of these newer theorem-proving dependently-typed
| languages (examples: Idris/Idris2 come to mind, but also the
| Coq/Rocq theorem prover, Agda, Lean), would it be theoretically
| possible to not only translate this code into a newer language
| but also suss out bugs and literally prove correctness? (Given
| how important some of this COBOL code seems to be, such as at
| Medicare)
|
| I know that one of the risks of changing the language that logic
| and computation is written in is unexpectedly changing the
| behavior or introducing new bugs; wondering if this might
| mitigate or almost entirely prevent that
| armorer wrote:
| It's not a crazy idea. For example, Amazon's BluAge offering
| does automatic translation. However, frequently, forward
| engineering teams do not want a 1:1 translation of the code,
| because that might end up reproducing the same system
| organisation of the original Cobol base (in a modern language),
| and engineers/architects usually want to work on a new design
| (while maintaining the original domain logic). So far, this
| library does not step into the forward engineering territory,
| and tries to merely provide useful information/artifacts, which
| could help the reverse engineering teams move (hopefully)
| faster, but obviously there is a lot of experimentation /
| inference that can be done on top of the extracted data.
| Closi wrote:
| Depends - sometimes/often the goal is just to bring it into a
| modern language to improve maintainability (i.e. its an
| easier hire) rather than to do a ground-up rewrite.
|
| Especially as ground-up rewrites are often risky, and
| bringing it into a modern language might make it easier to
| incrementally improve/refactor over time.
| SonOfLilit wrote:
| When people say "bring this cobol code to a new language to
| improve maintainability", they don't just mean the syntax.
| Any C developer can learn cobol syntax in 10 minutes. They
| mean things like use functions instead of gotos, don't use
| just global variables, don't depend on lots of weird
| tooling from the parallel world of IBM mainframes, with
| most of your logic hidden in weird batch scripts with names
| like ABC@@DEF.
|
| I could write a cobol to c translator in a weekend. Nobody
| would buy it. Source: I've spent the last year and a half
| consulting on a huge project to rewrite a cobol codebase.
| pmarreck wrote:
| So basically the real work is not syntax translation
| (which a machine could do, albeit probably poorly) but
| semantics translation (which requires a deep
| understanding of both languages as well as what the
| code's intent is), so that you can do things like replace
| goto's with function calls without breaking expected
| behavior given inputs.
|
| A similar problem to translating procedural code in a
| language with mutable variables to functional code in a
| language with immutable variables. A lot of old functions
| in C etc. were expected to modify their passed-in
| arguments, for example, which would be a no-no today
| (note: not in the C space, I'm currently an Elixir dev,
| but I'm hoping that's now frowned upon!)
|
| I've noticed that LLM's are still not very good at this,
| too.
|
| I'm unfamiliar with COBOL but I'm currently looking for
| work; not sure if you'd be up for a conversation just to
| discuss your work since (for some odd reason) I enjoy
| refactoring (as well as software preservation and
| validation); at the very least I'd probably be a decent
| rubber-duck if you got stuck on something lol
| SonOfLilit wrote:
| Reading COBOL and translating its behavior 1:1 is maybe
| 1% of my job. There is so much more to modernising a
| bank. Technologically and culturally.
|
| I can discuss my work, but unless you have a work permit
| in Israel, I won't be able to hire you.
| Muromec wrote:
| Is there any approach other than extensively documenting
| both the intent and inner workings of the system and then
| doing ground up rewrite?
| SonOfLilit wrote:
| There are a hundred approaches for as many big banks with
| mainframe cores. The problem of modernizing a bank is
| much, much bigger than what language its core logic is
| written in.
|
| We're talking about a custom software stack grown from a
| first version written fifty years ago, before the word
| "coupling" meant anything to programmers.
| mgsouth wrote:
| Fully automated? I don't think so, and it falls apart
| suprisingly quickly. To give an example:
|
| All variables (well, in "classical COBOL" at least) are global
| variables. Since memory is constrained, a really common idiom
| is to have great swathes of punned, overlayed variables; in C
| terms it would be unioned structs. Subroutine A would have a
| var A-TEMPS divided into A-TEMP-VAR1 and A-TEMP-VAR2. Since
| routine B isn't on the same call path, that _same area_ could
| be also divided into B-TEMPS, B-TEMP-X, B-RESULT, and C-MOVEIN
| (because hey, code got changed). When you port this mess to
| Java you can (a) emulate unions with mind-boggling complex ,
| huge, and fragile idioms, (b) tease out the actual code flow
| graphs and intent, or (c) some combination.
|
| And no, automated doesn't go very far for (b); although the
| computer might be able to figure out that March doesn't have a
| leap day and so this path won't execute because of that, it
| _is_ the end of a quarter and so has an artificial closing-day
| tacked on _if_ this is for subsidiary Foo, but not Bar because
| they have a 0-day on the following month. Too many combinations
| to exhaustively compute, and requires a lot of human smarts to
| prune possibilities.
| le-mark wrote:
| There's actually a lot of academic work around this from the
| 1990s; static analysis, reverse engineering, business logic
| extraction, re-engineering. All leading up to Y2K. There were
| quite a few commercial applications too. That all fizzled out
| after January 1, 2000 though.
| Retr0id wrote:
| I'm hoping it makes a comeback in the lead up to Y2038, heh
| childintime wrote:
| How about compiling Cobol to machine code, and then using an LLM
| to decompile to <your source language of choice>?
|
| This moves the focus from Cobol specific tools to Cobol agnostic
| tools.
| mburns wrote:
| Not sure what compiling it first gets you, but IBM had
| basically the same idea:
|
| https://news.ycombinator.com/item?id=38508250
| stuff4ben wrote:
| Kinda surprised this isn't an IBM tool. I suspect they could make
| a killing consulting/watsonXing with this.
| karmakaze wrote:
| There was a product in the 90s that ran on the PC and did static
| analysis on COBOL programs. I can't remember the exact name of
| it, something like renew-something-or-other. It had a query
| language where you could follow either the possible control flow
| or data flow from one point to others (or to a point from earlier
| ones).
|
| The only thing I've used like it recently was OQL (Object Query
| Language) for querying the Java heap.
|
| I remember Intellij had some static dataflow analysis and I do
| miss it working in RubyMine.
___________________________________________________________________
(page generated 2024-08-15 23:01 UTC)