[HN Gopher] Show HN: COBOL-REKT, a toolkit for analysing and rev...
       ___________________________________________________________________
        
       Show HN: COBOL-REKT, a toolkit for analysing and reverse-
       engineering COBOL
        
       This is an evolving toolkit of capabilities helpful for analysing
       and reverse engineering legacy Cobol code. Currently, the following
       capabilities are available:  - Program / Section-level flowchart
       generation based on AST (SVG or PNG) - Parse Tree generation (with
       export to JSON) - Control Flow Tree generation (with export to
       JSON) - Allows embedding code comments as comment nodes in the
       graph - The SMOJOL Interpreter (WIP) - Injecting AST and Control
       Flow into Neo4J - Injecting Cobol data layouts from Data Division
       into Neo4J (with dependencies like MOVE, COMPUTE, etc.) + export to
       JSON - Injecting execution traces from the SMOJOL interpreter into
       Neo4J - Integration with OpenAI GPT to summarise nodes using
       bottom-up node traversal (AST nodes or Data Structure nodes) -
       Exposes a unified model (AST, CFG, Data Structures with appropriate
       interconnections) which can be analysed through
       [JGraphT](https://jgrapht.org/), together with export to GraphML
       format and JSON. - Support for namespaces to allow unique
       addressing of (possibly same) graphs - ALPHA: Support for building
       Glossary of Variables from data structures using LLMs - ALPHA:
       Support for extracting Capability Graph from paragraphs of a
       program using LLMs - ALPHA: Injecting inter-program dependencies
       into Neo4J (with export to JSON) - ALPHA: Paragraph similarity map
       Contributions / use cases are welcome!
        
       Author : armorer
       Score  : 85 points
       Date   : 2024-08-15 10:04 UTC (12 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | robin_reala wrote:
       | So obviously there's been a lot of legacy COBOL kicking around,
       | but is this still the case? Would a new COBOL project have been
       | started in the last 20 years? I kind of imagined that Java (or at
       | least the JVM) has eaten its lunch.
        
         | slowmotiony wrote:
         | I worked for two very big banks in europe and we have lots of
         | cobol batch jobs that were written like 30 or 40 years ago.
        
           | jmclnx wrote:
           | How many objects without source did you find there ?
           | 
           | In the 80s (and probably before), every system I worked on
           | had at least one critical COBOL program missing source code.
           | 
           | I am wondering if you noticed the same with this project.
        
             | b800h wrote:
             | Missing source code? That's terrifying.
        
               | alex_suzuki wrote:
               | Par for the course.
        
             | derriz wrote:
             | I did some contracting for a retail bank a good while back
             | and the COBOL source for the term deposit interest
             | calculation routine had been lost a few decades earlier but
             | was still in use. I suggested a rewrite but there was no
             | enthusiasm/support for it - especially as my prototype
             | could not exactly reproduce the rounding. Nobody wanted to
             | have to deal with customer communications to explain why
             | one cent of interest was being paid a month earlier or
             | later than before.
        
               | jmclnx wrote:
               | > suggested a rewrite but there was no enthusiasm/support
               | for it
               | 
               | Been there :)
               | 
               | Yes, you always get a "no" for doing that.
        
               | SonOfLilit wrote:
               | Ahaha I'm also working with a bank and "customer
               | communications" is a really nice euphemism for "barrage
               | of class action suits". I'm sitting here in the office
               | laughing out loud.
        
             | slowmotiony wrote:
             | Totally normal, happened all the time :))
        
         | armorer wrote:
         | Yes, there's a lot of it kicking around. This evolved from a
         | personal testbed to iterate on ideas to apply on actual legacy
         | code modernisation work I've been involved in.
        
           | Muromec wrote:
           | So, is the color of your bank's logo blue, green, red or
           | orange?
        
         | nazgulsenpai wrote:
         | My (non-bank) employer still has lots of COBOL in production,
         | and its constantly extended. Before working here I expected all
         | COBOL to be running on some large IBM mainframe, but no. It's
         | x86-64 Windows -- COBOL compiled to native Windows binaries.
         | 
         | Edited to specify not a bank.
        
           | Suppafly wrote:
           | Is there some advantage to using COBOL or did they just have
           | an old COBOL programmer who just kept using it?
        
             | bigbuppo wrote:
             | Every single attempt to replace COBOL has ended in failure.
             | COBOL will never die. You can move it to new platforms, but
             | it can never be replaced.
        
           | CalumSult wrote:
           | As someone doing support and occasional code changes for a
           | pile of vb6 that doesn't sound that bad. If you need a code
           | base to be stable for decades COBOL beats vb6.
        
         | karlmdavis wrote:
         | A number of US federal agencies still have astonishing amounts
         | of it. The world's largest insurer, Medicare, uses 10M+ lines
         | of COBOL to process the claims it receives -- total dollar
         | amounts that make up 3% of the yearly GDP.
         | 
         | Maintaining and modernizing these critical systems is important
         | work.
        
         | uselpa wrote:
         | Our shop (a bank) still develops in COBOL on a daily basis (IBM
         | mainframe) and has a stock of tenth of millions of code lines.
        
           | ianmcgowan wrote:
           | Converting to Lithp any day now...
        
         | chasil wrote:
         | Absolutely, we have several full time people who specialize in
         | COBOL variants for VMS and OS2200.
        
         | Muromec wrote:
         | Places that have cobol are the same places that have a bunch or
         | laws being your list of requirement and nobody stopped writing
         | laws in those 20 years. Those are also the places where risk is
         | supposed to be in a different department.
        
         | SonOfLilit wrote:
         | There are probably tens of thousands of active COBOL developers
         | maintaining systems written no later than I guess the nineties.
        
         | rodgerd wrote:
         | Yes?
         | 
         | Any "lets move this COBOL to something else" will, more often
         | than not, flounder on the fact that "rewrite all this
         | functionality" will cost massively more than "just find someone
         | to extend what we already have".
        
       | pmarreck wrote:
       | Here's a crazy idea (and possibly a job opportunity for someone?)
       | 
       | If someone built a tool to translate the AST generated by this
       | into one of these newer theorem-proving dependently-typed
       | languages (examples: Idris/Idris2 come to mind, but also the
       | Coq/Rocq theorem prover, Agda, Lean), would it be theoretically
       | possible to not only translate this code into a newer language
       | but also suss out bugs and literally prove correctness? (Given
       | how important some of this COBOL code seems to be, such as at
       | Medicare)
       | 
       | I know that one of the risks of changing the language that logic
       | and computation is written in is unexpectedly changing the
       | behavior or introducing new bugs; wondering if this might
       | mitigate or almost entirely prevent that
        
         | armorer wrote:
         | It's not a crazy idea. For example, Amazon's BluAge offering
         | does automatic translation. However, frequently, forward
         | engineering teams do not want a 1:1 translation of the code,
         | because that might end up reproducing the same system
         | organisation of the original Cobol base (in a modern language),
         | and engineers/architects usually want to work on a new design
         | (while maintaining the original domain logic). So far, this
         | library does not step into the forward engineering territory,
         | and tries to merely provide useful information/artifacts, which
         | could help the reverse engineering teams move (hopefully)
         | faster, but obviously there is a lot of experimentation /
         | inference that can be done on top of the extracted data.
        
           | Closi wrote:
           | Depends - sometimes/often the goal is just to bring it into a
           | modern language to improve maintainability (i.e. its an
           | easier hire) rather than to do a ground-up rewrite.
           | 
           | Especially as ground-up rewrites are often risky, and
           | bringing it into a modern language might make it easier to
           | incrementally improve/refactor over time.
        
             | SonOfLilit wrote:
             | When people say "bring this cobol code to a new language to
             | improve maintainability", they don't just mean the syntax.
             | Any C developer can learn cobol syntax in 10 minutes. They
             | mean things like use functions instead of gotos, don't use
             | just global variables, don't depend on lots of weird
             | tooling from the parallel world of IBM mainframes, with
             | most of your logic hidden in weird batch scripts with names
             | like ABC@@DEF.
             | 
             | I could write a cobol to c translator in a weekend. Nobody
             | would buy it. Source: I've spent the last year and a half
             | consulting on a huge project to rewrite a cobol codebase.
        
               | pmarreck wrote:
               | So basically the real work is not syntax translation
               | (which a machine could do, albeit probably poorly) but
               | semantics translation (which requires a deep
               | understanding of both languages as well as what the
               | code's intent is), so that you can do things like replace
               | goto's with function calls without breaking expected
               | behavior given inputs.
               | 
               | A similar problem to translating procedural code in a
               | language with mutable variables to functional code in a
               | language with immutable variables. A lot of old functions
               | in C etc. were expected to modify their passed-in
               | arguments, for example, which would be a no-no today
               | (note: not in the C space, I'm currently an Elixir dev,
               | but I'm hoping that's now frowned upon!)
               | 
               | I've noticed that LLM's are still not very good at this,
               | too.
               | 
               | I'm unfamiliar with COBOL but I'm currently looking for
               | work; not sure if you'd be up for a conversation just to
               | discuss your work since (for some odd reason) I enjoy
               | refactoring (as well as software preservation and
               | validation); at the very least I'd probably be a decent
               | rubber-duck if you got stuck on something lol
        
               | SonOfLilit wrote:
               | Reading COBOL and translating its behavior 1:1 is maybe
               | 1% of my job. There is so much more to modernising a
               | bank. Technologically and culturally.
               | 
               | I can discuss my work, but unless you have a work permit
               | in Israel, I won't be able to hire you.
        
               | Muromec wrote:
               | Is there any approach other than extensively documenting
               | both the intent and inner workings of the system and then
               | doing ground up rewrite?
        
               | SonOfLilit wrote:
               | There are a hundred approaches for as many big banks with
               | mainframe cores. The problem of modernizing a bank is
               | much, much bigger than what language its core logic is
               | written in.
               | 
               | We're talking about a custom software stack grown from a
               | first version written fifty years ago, before the word
               | "coupling" meant anything to programmers.
        
         | mgsouth wrote:
         | Fully automated? I don't think so, and it falls apart
         | suprisingly quickly. To give an example:
         | 
         | All variables (well, in "classical COBOL" at least) are global
         | variables. Since memory is constrained, a really common idiom
         | is to have great swathes of punned, overlayed variables; in C
         | terms it would be unioned structs. Subroutine A would have a
         | var A-TEMPS divided into A-TEMP-VAR1 and A-TEMP-VAR2. Since
         | routine B isn't on the same call path, that _same area_ could
         | be also divided into B-TEMPS, B-TEMP-X, B-RESULT, and C-MOVEIN
         | (because hey, code got changed). When you port this mess to
         | Java you can (a) emulate unions with mind-boggling complex ,
         | huge, and fragile idioms, (b) tease out the actual code flow
         | graphs and intent, or (c) some combination.
         | 
         | And no, automated doesn't go very far for (b); although the
         | computer might be able to figure out that March doesn't have a
         | leap day and so this path won't execute because of that, it
         | _is_ the end of a quarter and so has an artificial closing-day
         | tacked on _if_ this is for subsidiary Foo, but not Bar because
         | they have a 0-day on the following month. Too many combinations
         | to exhaustively compute, and requires a lot of human smarts to
         | prune possibilities.
        
       | le-mark wrote:
       | There's actually a lot of academic work around this from the
       | 1990s; static analysis, reverse engineering, business logic
       | extraction, re-engineering. All leading up to Y2K. There were
       | quite a few commercial applications too. That all fizzled out
       | after January 1, 2000 though.
        
         | Retr0id wrote:
         | I'm hoping it makes a comeback in the lead up to Y2038, heh
        
       | childintime wrote:
       | How about compiling Cobol to machine code, and then using an LLM
       | to decompile to <your source language of choice>?
       | 
       | This moves the focus from Cobol specific tools to Cobol agnostic
       | tools.
        
         | mburns wrote:
         | Not sure what compiling it first gets you, but IBM had
         | basically the same idea:
         | 
         | https://news.ycombinator.com/item?id=38508250
        
       | stuff4ben wrote:
       | Kinda surprised this isn't an IBM tool. I suspect they could make
       | a killing consulting/watsonXing with this.
        
       | karmakaze wrote:
       | There was a product in the 90s that ran on the PC and did static
       | analysis on COBOL programs. I can't remember the exact name of
       | it, something like renew-something-or-other. It had a query
       | language where you could follow either the possible control flow
       | or data flow from one point to others (or to a point from earlier
       | ones).
       | 
       | The only thing I've used like it recently was OQL (Object Query
       | Language) for querying the Java heap.
       | 
       | I remember Intellij had some static dataflow analysis and I do
       | miss it working in RubyMine.
        
       ___________________________________________________________________
       (page generated 2024-08-15 23:01 UTC)