[HN Gopher] Why are cancer guidelines stuck in PDFs?
       ___________________________________________________________________
        
       Why are cancer guidelines stuck in PDFs?
        
       Author : huerne
       Score  : 249 points
       Date   : 2024-12-23 23:36 UTC (23 hours ago)
        
 (HTM) web link (seangeiger.substack.com)
 (TXT) w3m dump (seangeiger.substack.com)
        
       | osmano807 wrote:
       | I know it's not the same, but in many areas we have this "follow
       | the arrows" system in many guidelines. For some examples, see the
       | EULAR guidelines with it's fluxograms for treatments and also AO
       | Surgery Reference with a graphical approach to select treatments
       | based on fracture pattern, avaliable materials and skill set.
       | 
       | I think that's a logical and necessary step to join medical
       | reasoning and computer helpers, we need easier access to new
       | information and more importantly to present clinical relevant
       | facts from the literature in a way that helps actual patient care
       | decision making.
       | 
       | I'm just not too sure we can have generic approaches to all
       | specialties, but it's nice seeing efforts in this area.
        
       | pcrh wrote:
       | The fundamental idea here is that doctors find it difficult to
       | ensure that their recommendations are actually up-to-date with
       | the latest clinical research.
       | 
       | Further, that by virtue of being at the centre of action in
       | research, doctors in prestige medical centres have an advantage
       | that _could_ be available to all doctors. It 's a pretty
       | important point, sometimes referred to as the dissemination of
       | knowledge problem.
       | 
       | Currently, this is best approached by publishing systematic
       | reviews according to the Cochrane Criteria [0]. Such reviews are
       | quite labour-intensive and done all too rarely, but are very
       | valuable when done.
       | 
       | One aspect of such reviews, when done, is how often they discard
       | published studies for reasons such as bias, incomplete datasets,
       | and so forth.
       | 
       | The approach described by Geiger in the link is commendable for
       | its intentions but the outcome will be faced with the same
       | problem that manual systematic reviews face.
       | 
       | I wonder if the author considered included rules-based approaches
       | (e.g. Cochrane guidelines) in addition to machine learning
       | approaches?
       | 
       | [0] https://training.cochrane.org/handbook
        
         | slaucon wrote:
         | Hey author here--Cochrane reviews are great.
         | 
         | NCCN guidelines and Cochrane Reviews serve complementary roles
         | in medicine - NCCN provides practical, frequently updated
         | cancer treatment algorithms based on both research and expert
         | consensus, while Cochrane Reviews offer rigorous systematic
         | analyses of research evidence across all medical fields with a
         | stronger focus on randomized controlled trials. The NCCN
         | guidelines tend to be more immediately applicable in clinical
         | practice, while Cochrane Reviews provide a deeper analysis of
         | the underlying evidence quality.
         | 
         | My main goal here was to show what you could do with any set of
         | medical guidelines that was properly structured. You can choose
         | any criteria you want.
        
         | liontwist wrote:
         | > doctors find it difficult to ensure that their
         | recommendations are actually up-to-date with the latest
         | clinical research
         | 
         | Doctors care about as much this as software engineers care
         | about the latest computer science research. A few curious ones
         | do. But the general attitude is they already did tough years of
         | school so they don't have to anymore.
        
           | refurb wrote:
           | I worked with oncologists and this isn't true.
           | 
           | Oncology has a rapidly changing treatment landscape and it's
           | common for oncologists to be discussing the latest paper that
           | has come out.
           | 
           | If you're an oncologist and not keeping up with the
           | literature you're going to be out of date in your decisions
           | in about 6 months from graduation.
        
             | liontwist wrote:
             | Funny enough that last paragraph is also said of software
             | engineers too. Neither are true.
        
               | mort96 wrote:
               | Yeah, non-programmers seem to think everything is
               | changing so quickly all the time yet here I am writing in
               | a 40 year old language against UNIX APIs from the 70s
               | -\\_(tsu)_/-
        
         | resource_waste wrote:
         | It amazes me that AI isnt a borderline requirement for being a
         | doctor. Think of how much info is outdated or just wrong.
        
       | prepend wrote:
       | I'd rather have the pdf than a custom tool. Especially
       | considering the tool will be unique to the practice or emr. And
       | likely expensive to maintain.
       | 
       | PDFs suck in many ways but are durable and portable. If I work
       | with two oncologists, I use the same pdf.
       | 
       | The author means well but his solution will likely be worse
       | because only he will understand it. And there's a million edge
       | cases.
        
         | akoboldfrying wrote:
         | The author is proposing that the DAG representation be _in
         | addition to_ the PDF:
         | 
         | >The organizations drafting guidelines should release them in
         | structured, machine-interpretable formats in addition to the
         | downloadable PDFs.
         | 
         | My opinion: Ideally the PDF could be _generated from_ the
         | underlying DAG -- that would give you confidence that
         | everything in the PDF has been captured in the DAG.
        
           | maxerickson wrote:
           | You could generate the document from the graph and then
           | attach it as data.
        
             | JumpCrisscross wrote:
             | > _could generate the document from the graph and then
             | attach it as data_
             | 
             | Much easier for doctors to draft PDFs than graphs.
        
               | seb1204 wrote:
               | I have not drafted a PDF myself and I doubt doctors will.
               | They work in a text writer or spreadsheet application and
               | then export or print to PDF would be my guess. An
               | interactive interface could spit out the PDF with the
               | decision three in the end. This solution would still mean
               | the decision tree source is in some software package.
        
         | crazygringo wrote:
         | Exactly. The PDF's _work_. They won 't break. You can see all
         | the information with your own eyes. You can send them by
         | e-mail.
         | 
         | A wizard-type system hides most of the information from you, it
         | might have bugs you aren't aware of, if you want to glance at
         | an alternative path you can't, it's going to be locked into
         | registered users, the system can go down.
         | 
         | I think much more intelligent computer systems are the future
         | in health care, but I doubt the way to start is with yet
         | another custom tool designed specifically for cancer guidelines
         | and nothing else.
        
           | crabmusket wrote:
           | > it's going to be locked into registered users, the system
           | can go down
           | 
           | I didn't see anything in the screenshots presented that
           | wouldn't be doable in a single HTML file containing the data,
           | styles and scripts?
           | 
           | This is a countercultural idea but it fits so many use cases;
           | it's a tragedy we don't do this more often. The two options
           | are either PDF or SaaS.
        
           | ajsnigrutin wrote:
           | > The PDF's work. They won't break.
           | 
           | Not just that, PDFs are one of the few formats, where i'm
           | willing to bet my own money, that they'll still work in 10 or
           | 20 years.
           | 
           | Even basic html has changed, layouts look different depending
           | on many factors, and even the <blink>-ing doesn't work
           | anymore.
        
             | seb1204 wrote:
             | A case specific PDF could be created and stored in the
             | patient's electronic records. Such PDF could just highlight
             | the decision three path.
        
             | inferiorhuman wrote:
             | Sure, PDF/A is an ISO-standardized subset of the larger PDF
             | spec designed expressly for archival purposes. You could do
             | that with HTML but then how would you get your crypto
             | mining AI chat bot powered by WASM to work?
        
         | slaucon wrote:
         | Hey author here! Appreciate the feedback! Agreed on importance
         | of portability and durability.
         | 
         | I'm not trying to build this out or sell it as a tool to
         | providers. Just wanted to demo what you could do with
         | structured guidelines. I don't think there's any reason this
         | would have to be unique to a practice or emr.
         | 
         | As sister comments mentioned, I think the ideal case here would
         | be if the guideline institutions released the structured
         | representations of the guidelines along with the PDF versions.
         | They could use a tool to draft them that could export in both
         | formats. Oncologists could use the PDFs still, and systems
         | could lean into the structured data.
        
           | Dalewyn wrote:
           | >Agreed on importance of portability and durability.
           | 
           | I think "importance" is understating it, because permanent
           | consistency is practically the only reason we all (still) use
           | PDFs in quite literally every professional environment as a
           | lowest common denominator industrial standard.
           | 
           | PDFs will always render the same, whether on paper or a
           | screen of any size connected to a computer of any
           | configuration. PDFs will almost always open and work given
           | Adobe Reader, which these days is simply embedded in Chrome.
           | 
           | PDFs will almost certainly Just Work(tm), and Just
           | Working(tm) is a god damn virtue in the professional world
           | because time is money and nobody wants to be embarrassed
           | handing out unusable documents.
        
             | abtinf wrote:
             | PDFs generally will look close enough to the original
             | intent that they will almost always be usable, but will not
             | always render the same. If nothing else, there are
             | seemingly endless font issues.
        
               | lstamour wrote:
               | In this day and age that seems increasingly like a solved
               | problem to most end users, often a client-side issue or
               | using a very old method of generating a PDF?
               | 
               | Modern PDF supports font embedding of various kinds
               | (legality is left as an exercise to the PDF author) and
               | supports 14 standard font faces which can be specified
               | for compatibility, though more often document authors
               | probably assume a system font is available or embed one.
               | 
               | There are still problems with the format as it foremost
               | focuses on document display rather than document
               | structure or intent, and accessibility support in
               | documents is often rare to non-existent outside of
               | government use cases or maybe Word and the like.
               | 
               | A lot of usability improvements come from clients that
               | make an attempt to parse the PDF to make the format
               | appear smarter. macOS Preview can figure out where
               | columns begin and end for natural text selection, Acrobat
               | routinely generates an accessible version of a document
               | after opening it, including some table detection.
               | Honestly creative interpretation of PDF documents is
               | possibly one of the best use cases of AI that I've ever
               | heard of.
               | 
               | While a lot about PDF has changed over the years the
               | basic standard was created to optimize for printing. It's
               | as if we started with GIF and added support to build
               | interactive websites from GIFs. At its core, a PDF is
               | just a representation of shapes on a page, and we added
               | metadata that would hopefully identify glyphs, accessible
               | alternative content, and smarter text/line selection, but
               | it can fall apart if the PDF author is careless,
               | malicious or didn't expect certain content. It probably
               | inherits all the weirdness of Unicode and then some, for
               | example.
        
               | seb1204 wrote:
               | I would assume these decision tree PDF use a commonly
               | available font. Layout and interpreted outcomes should be
               | the same.
        
           | killjoywashere wrote:
           | The cancer reporting protocols from the College of American
           | Pathologists are available in structured format (1). No major
           | laboratory information system vendor properly implements
           | them, properly, and their implementation errors cause some
           | not-insignificant problems with patient care (oncologists
           | calling the lab asking for clarification, etc). This has
           | pushed labs to make policies disallowing the use of those
           | modules and individual pathologists reverting to their own
           | non-portable templates in Word documents.
           | 
           | The medical information systems vendors are right up there
           | with health insurance companies in terms of their investment
           | in ensuring patient deaths. Ensuring. With an E.
           | 
           | (1) https://www.cap.org/protocols-and-guidelines/electronic-
           | canc...
        
             | all2 wrote:
             | > The medical information systems vendors are right up
             | there with health insurance companies in terms of their
             | investment in ensuring patient deaths. Ensuring. With an E.
             | 
             | Can you expand on this?
        
               | righthand wrote:
               | Medical information system vendors only care about making
               | a profit, not implementing actual solutions. The
               | discrepancies between systems can lead to bad information
               | which can cost people their life.
        
               | ethbr1 wrote:
               | As an analogy, imagine if the consequence of Oracle doing
               | Oracle-as-usual things was worse medical outcomes. But
               | they did them anyway for profit.
               | 
               | That's basically medical information system vendors.
               | 
               | The fact that the US hasn't pushed open source EMRs
               | through CMS is insane. It's literally the perfect problem
               | for an open solution.
        
               | caboteria wrote:
               | It's worse than that. VistA is a world-class open source
               | EMR that the VA has been trying to kill for decades.
        
             | PoignardAzur wrote:
             | I mean, you're attributing malice, but it could just be
             | that reliably implementing the formats is a really really
             | hard problem?
        
               | TheAceOfHearts wrote:
               | How about fixing the format? Something that is obviously
               | broken and resulting in patient deaths should really be
               | considered a top priority. It's either malice or masskve
               | incompetence. If these protocols were open there would
               | definitely be volunteers willing to help fix it.
        
               | mort96 wrote:
               | Incompetence at this level is intentional, it means
               | someone doesn't think they'll see RoI from investing
               | resources into improving it. Calling it malice is
               | appropriate I feel.
        
               | layer8 wrote:
               | If there is no ROI, investing further resources would be
               | charity work. I don't think it's accurate to call a
               | company not doing so malicious.
        
               | WitCanStain wrote:
               | Not actively malicious perhaps, but prioritising profits
               | over lives is evil. Either you take care to make sure the
               | systems you sell lead to the best possible outcomes, or
               | you get out of the sector.
        
               | layer8 wrote:
               | The company not existing at all might be worse though? I
               | think it's too easy to make blanket judgments like that
               | from the outside, and it would be the job of regulation
               | to counteract adverse incentives in the field.
        
               | PoignardAzur wrote:
               | You seem to think that the default assumption is that
               | fixing the format is easy/feasible, and I don't see why.
               | Do you have domain knowledge pointing that way?
               | 
               | It's a truism in machine learning that curating and
               | massaging your dataset is the most labor-intensive and
               | error-prone part of any project. I don't why that would
               | stop being true in healthcare just because lives are on
               | the line.
        
               | prepend wrote:
               | I think there are more options than malice or
               | incompetence. My theory is difficulty.
               | 
               | There's multiple countries with socialized medicine and
               | no profit motive and it's still not solved.
               | 
               | I think it's just really complex with high negative
               | consequences from a mistake. It takes lots of investment
               | with good coordination to solve and there's an "easy
               | workaround" with pdfs that distributes liability to
               | practitioners.
        
               | ethbr1 wrote:
               | Healthcare suffers from strict regulatory requirements,
               | underinvestment in organic IT capabilities, and huge
               | integration challenges (system-to-system).
               | 
               | Layering any sort of data standard into that environment
               | (and evolving it in a timely manner!) is nigh impossible
               | without an external impetus forcing action (read:
               | government payer mandate).
        
             | jjmarr wrote:
             | It doesn't look like the XML data is freely accessible.
             | 
             | If I could get access to this data as a random student on
             | the internet, I'd love to create an open source tool that
             | generates an interactive visualization.
        
             | zo1 wrote:
             | People could potentially properly implement them if they
             | were open and available:
             | 
             | "Contact the CAP for more information about licensing and
             | using the CAP electronic Cancer Protocols for cancer
             | reporting at your institution."
             | 
             | This stinks of the same gate-keeping that places like NIST
             | and ISO do, charging you for access to their "standards".
        
               | fl0id wrote:
               | For liability reasons alone, you cannot just have random
               | people working on health/lab stuff and the requisite
               | vendors have access to these standards.
        
               | joshuaissac wrote:
               | According to what killjoywashere said, the vendors do not
               | want to implement these standards. So if CAP wants the
               | standards to be relevant, they should release them for
               | random people to implement.
        
               | prepend wrote:
               | Aren't all NIST standards free as they are a government
               | body?
        
           | prepend wrote:
           | I believe you have good intentions, but someone would need to
           | build it out and sell it. And it requires lots of
           | maintenance. It's too boring for an open source community.
           | 
           | There's a whole industry that attempts to do what you do and
           | there's a reason why protocols keep getting punted back to
           | pdf.
           | 
           | I agree it would be great to release structured
           | representations. But I don't think there's a standard for
           | that representation, so it's kind of tricky as who will
           | develop and maintain the data standard.
           | 
           | I worked on a decision support protocol for Ebola and it was
           | really hard to get code sets released in Excel. Not to
           | mention the actual decision gates in a way that is
           | computable.
           | 
           | I hope we make progress on this, but I think the incentives
           | are off for the work to make the data structures necessary.
        
         | KPGv2 wrote:
         | You say this, but on the other hand, the author alleges that
         | the places that use these custom tools achieve better outcomes.
         | You didn't address this point one way or the other.
         | 
         | Do you think this is a completely fabricated non-explanation?
         | It's not like the link says "the worst places use these custom
         | tools."
        
         | ahardison wrote:
         | Totally valid concerns. If you have time, I would like to show
         | you my solution to get your thoughts as I believe I have found
         | ways to mitigate all of your concerns. Currently I am using
         | STCC (Schmitt-Thompson Clinical Content). I Have sent you some
         | of the PDF's we use for testing.
        
         | zahlman wrote:
         | It would, I imagine, be much easier to generate a PDF from the
         | tool's internal flowchart representation than the other way
         | around.
        
         | Spooky23 wrote:
         | I think there's value if it can scale down.
         | 
         | Community oncologists have limited technology resources as
         | compared to a national cancer center. If we can make their
         | lives easier, it can only be a good thing.
         | 
         | That said, I like published documents like PDFs - systems
         | usually make it hard to conii ok are the June release from the
         | September release.
        
         | layer8 wrote:
         | I agree. However, since the PDF format supports structured
         | data, one could in principle have it both ways, within a single
         | file.
        
           | queuebert wrote:
           | ^ This. See, e.g., https://lab6.com/ for some interesting
           | tricks with the PDF format.
        
       | upghost wrote:
       | It's so much worse than you could possibly imagine. I worked for
       | a healthcare startup working on patient enrollment for clinical
       | oncology trials. The challenges are amazing. Quite frankly it
       | wouldn't matter if the data were in plaintext. The diagnostic
       | codes vary between providers, the semantic understanding of the
       | diagnostic information has different meanings between providers,
       | electronic health records are a mess, things are written entirely
       | in natural language rather than some kind of data structure.
       | Anyone who's worked in healthcare software can tell you way more
       | horror stories.
       | 
       | I do hope that LLMs can help straighten some of it out but anyone
       | whos done healthcare software, the problems are not technical,
       | they are quite human.
       | 
       | That being said one bright spot is we've (my colleagues, not me)
       | made a huge step forward using category theory and Prolog to
       | discover the provably optimal 3+3 clinical oncology dose
       | escalation trial protocol[1]. David gave a great presentation on
       | it at the Scryer Prolog meetup[2] in Vienna.
       | 
       | It's kind of amazing how in the dark ages we are with medicine.
       | Even though this is the first EXECUTABLE/PROGRAMMABLE SPEC for a
       | 3+3 cancer trial, he is still fighting to convince his medical
       | colleagues and hospital administrators that this is the optimal
       | trial because -- surprise -- they don't speak software (or
       | statistics).
       | 
       | [1]: https://arxiv.org/abs/2402.08334
       | 
       | [2]: https://www.digitalaustria.gv.at/eng/insights/Digital-
       | Austri...
        
         | slaucon wrote:
         | This is a fascinating idea!
        
         | sebmellen wrote:
         | Have you read Jake Seliger's pieces on oncology clinical trials
         | https://jakeseliger.com/.
        
           | upghost wrote:
           | Oh wow. No, that's heart breaking. I'll have to read up on
           | this. Reminds me of David explaining the interesting and
           | somewhat surprisingly insensitive language the oncology
           | literature uses towards folks going through this. Its there
           | for historical reasons but slow to change.
           | 
           | It also shows how important getting dose escalation trials
           | are. The whole point is finding the balance point where "cure
           | is NOT worse than the disease". A bad dose can be worse than
           | the cancer itself, and conducting the trials correctly is
           | extremely important... and this really underscores the human
           | cost. Truly heartbreaking :(
        
       | londons_explore wrote:
       | Decision trees work for making decisions...
       | 
       | But they don't work as well as other decisionmaking techniques...
       | Random forests, linear models, neural nets, etc. are all decision
       | making techniques at their core.
       | 
       | And decision trees perform poorly for complex systems where lots
       | of data exists - ie. human health.
       | 
       | So why are we using a known-inferior technique simply because
       | it's easier to write down in a PDF file, reason about in a
       | meeting, or explain to someone?
       | 
       | Shouldn't we be using the most advanced mathematical models
       | possible with the highest 'cure' probability, even if they're so
       | complex no human can understand them?
        
         | wizzwizz4 wrote:
         | Models too complex for humans to understand don't, in practice,
         | have a high 'cure' probability.
        
         | s1artibartfast wrote:
         | Dinner generation is usually based on decision tree models as
         | well, so they match the resolution of the available data.
         | 
         | The practice of real world medicine often interpolates between
         | these data points.
        
         | epcoa wrote:
         | > complex systems where lots of data exists
         | 
         | Not a lot of high quality data exists for human health.
         | Clinical guidelines for many diseases are built around
         | surprisingly scant evidence many times.
         | 
         | > even if they're so complex no human can understand them?
         | 
         | That'll be wonderful to explain in court when they figure out
         | it was just data smuggling or whatever other bias.
        
           | epistasis wrote:
           | In cancer there's an abundance of clinical trials with high
           | quality data, but it is all very _complex_ in terms of
           | encoding what the clinical trial actually encoded.
           | 
           | Go to a clinical cancer conference and you will see the grim
           | reality of 10,000s of people contributing to the knowledge
           | discovery process with their cancer care. There is an inverse
           | relationship between the number of people in a trial and the
           | amount of risk that goes into that trial, but it is still a
           | massive amount of data that needs to be codified into some
           | sensible system, and it's hard enough for a person to do it.
           | 
           | > That'll be wonderful to explain in court when they figure
           | out it was just data smuggling or whatever other bias.
           | 
           | What do you mean by this? I'm not aware of any data smuggling
           | that has ever happened in a clinical trial. The "bias" is
           | that any research hypothesis comes from the fundamentally
           | biased position of "I think the data is telling me this" but
           | I've seen very little bias of truly bad hypotheses in cancer
           | research like those that have dominated, say Alzheimer's
           | research. Any research malfeasance should be prosecuted to
           | the fullest, but I don't think cancer research has much of
           | it. This was a huge scandal, but I don't think it pointed to
           | much in the way of bad research in the end:
           | 
           | https://www.propublica.org/article/doctor-jose-baselga-
           | cance...
        
             | epcoa wrote:
             | By smuggling and bias I meant in an ML model. Smuggling was
             | a bit informal, but referring to models overfit on
             | unintended features or artifacts.
        
               | londons_explore wrote:
               | but we have well established ways to deal with those...
               | test/validation sets, n-fold validation, etc.
               | 
               | Even if there was some overfitting or data contamination
               | that was undetected, the result would most probably still
               | be better than a hand-made decision tree over the same
               | data...
        
               | wizzwizz4 wrote:
               | Hand-made decision trees are open to inspection,
               | comprehension, and adaption. There is no way to adapt an
               | opaque ML model to new findings / an experimental
               | treatment except by producing a new model.
        
               | epcoa wrote:
               | Ok, until you can sue the AI you need to find a doctor ok
               | putting their license behind saying "I have no idea how
               | this shiny thing works". There are indeed some that will,
               | but not a consensus.
        
       | a1o wrote:
       | I parsed some mind maps that were constructed with a tool and
       | exported as pdfs (original sources were lost a long time ago) and
       | I used python with tesseract for the text and opencv and it
       | worked alright. I am curious why the author went with LLMs, but I
       | guess with the mentioned amount of data it wasn't hard to recheck
       | everything later.
        
       | inopinatus wrote:
       | > The whole set of guidelines for a type of cancer breaks down
       | into a few disjointed directed graphs
       | 
       | Nothing undermines medicine quite so thoroughly as yet another
       | astronaut trying to force it into a data structure.
        
         | prepend wrote:
         | Comically, I worked in this space and initially tried to get
         | decision support working with data structures and code sets and
         | such.
         | 
         | I ended up only really contributed adding version numbers to
         | the pdf. So at least people knew they had the latest and same
         | versions. And that took a year, to get versions added to
         | guideline pdfs.
        
           | johnisgood wrote:
           | That is wild, one would think versioning is extremely
           | important. They tend to just put the timestamp in the
           | filename (sometimes), which I guess is better than nothing.
           | 
           | Don't signed PDFs include a timestamp, however?
        
             | prepend wrote:
             | Getting in the file name was kind of easy. But I meant
             | adding it visually in the pdf guidance so readers could
             | tell. Just numbers in the lower left corner. Or maybe
             | right.
             | 
             | The guideline was available via url so the filename
             | couldn't change.
        
       | LorenPechtel wrote:
       | The real problem is that the guidelines are written for humans in
       | the first place. Workarounds like this shouldn't be needed, to go
       | from a machine friendly layout to a human friendly one is usually
       | quite easy.
       | 
       | And from what he says a decision tree isn't really the right
       | model in the first place. What about no tree, just a heap of
       | records in a SQL database. You do a query on the known
       | parameters, if the response comes back with only one item in the
       | treatment column you follow it. If it comes back with multiple
       | items you look at what would be needed to distinguish them and do
       | the test(s).
        
       | noonanibus wrote:
       | Forgive me if I'm mistaken, but isn't this exactly what the FHIR
       | standard is meant to address? Not only does it enable global
       | inter-health communication using a standardized resource, but
       | it's already adopted in several national health services,
       | including (but not broadly), America. Is this not simply a
       | reimplementation, but without the broad iterations of HL7?
        
         | nradov wrote:
         | Right, it would make more sense to use HL7 FHIR (possibly along
         | with CQL) as a starting point instead of reinventing the wheel.
         | Talk to the CodeX accelerator about writing an Implementation
         | Guide in this area. The PlanDefinition resource type should be
         | a good fit for modeling cancer guidelines.
         | 
         | https://codex.hl7.org/
         | 
         | https://www.hl7.org/fhir/plandefinition.html
        
           | joshuakelly wrote:
           | This is the comment I was looking for.
           | 
           | You would aim to use CQL expressions inside of a
           | PlanDefinition, in my estimate. This is exactly what AHRQ's,
           | part of HHS, CDS Connect project aims to create / has
           | created. They publish freely accessible computable decision
           | support artifacts here:
           | https://cds.ahrq.gov/cdsconnect/repository
           | 
           | When they are fully computable, they are FHIR PlanDefinitions
           | (+ other resources like Questionnaire, etc) and CQL.
           | 
           | Here's an example of a fully executable Alcohol Use Disorder
           | Identification Test:
           | https://cds.ahrq.gov/cdsconnect/artifact/alcohol-
           | screening-u...
           | 
           | There's so much other infrastructure around the EHR here to
           | understand (and take advantage of). I think there's a big
           | opportunity in proving that multimodal LLM can reliably
           | generate these artifacts from other sources. It's not the LLM
           | actually being a decision support tool itself (though that
           | may well be promising), but rather the ability to generate
           | standardized CDS artifacts in a highly scalable, repeatable
           | way.
           | 
           | Happy to talk to anyone about any of these ideas - I started
           | exactly where OP was.
        
             | osmano807 wrote:
             | I downloaded and opened an CDS for osteoporosis from the
             | link (as a disease in my specialty), I need an API key to
             | view what a "valueset" entails, so in practice I couldn't
             | assert if the recommendation aligns with clinical practice,
             | nor in the CQL provided have any scientific references
             | (even a textbook or a weak recommendation from a guideline
             | would be sufficient, I don't think the algorithm should be
             | the primary source of the knowledge)
             | 
             | I tried to see if HL7 was approachable for small teams, I
             | personally became exhausted from reading it and trying to
             | think how to implement a subset of it, I know it's
             | "standard" but all this is kinda unapproachable.
        
               | nradov wrote:
               | You can register for a free NLM account to access the
               | value sets (VSAC). HL7 standards are approachable for
               | small teams but due to the inherent complexity of
               | healthcare it can take a while to get up to speed. The
               | FHIR Fundamentals training course is a good option for
               | those who are starting out.
               | 
               | https://www.hl7.org/training/fhir-
               | fundamentals.cfm?ref=nav
               | 
               | It might seem tempting to avoid the complexity of FHIR
               | and CQL by inventing your own simple schema or data
               | formats for a narrow domain. But I guarantee that what
               | you thought was simple will eventually grow and grow
               | until you find that you've reinvented FHIR -- badly. I've
               | seen that happen over and over in other failed projects.
               | Talk to the CodeX accelerator I linked above and they
               | should be able to get you pointed in the right direction.
        
       | jdlyga wrote:
       | PDFs are a universal, machine readable format.
        
         | sswatson wrote:
         | They're only machine-readable in the very weak sense that all
         | computer files are machine-readable.
        
         | GeneralMayhem wrote:
         | PDFs are the opposite of machine-readable if you want to do
         | anything other than render them as images on paper or a screen.
         | They're only slightly more machine-readable than binary
         | executables.
         | 
         | I hate, hate, hate, hate, _hate_ the practice of using PDFs as
         | a system of record. They are intended to be a print format for
         | ensuring consistent typesetting and formatting. For that, I
         | have no quarrel. But so much of the world economy is based on
         | taking text, docx (XML), spreadsheets, or even _CSV_ files,
         | rendering them out as PDFs, and then emailing them around or
         | storing them in databases. They 've gone from being simply a
         | view layer to infecting the model layer.
         | 
         | PDFs are a step better than passing around screenshots of text
         | as images - when they don't literally consist of a single
         | image, that is. But even for reasonably-well-behaved, mostly-
         | text PDFs, finding things like "headers" and "sections" in the
         | average case is dependent on a huge pile of heuristics about
         | spacing and font size conventions. None of that semantic
         | structure exists, it's just individual characters with X-Y
         | coordinates. (My favorite thing to do with people starting to
         | work with PDFs is to tell them that the files don't usually
         | contain any whitespace characters, and then watch the horror
         | slowly dawn as they contemplate the implications.) (And yes, I
         | know that PDF/A theoretically exists, but it's not reliably
         | used, and certainly won't exist on any file produced more than
         | a couple years ago.)
         | 
         | Now, with multi-modal LLMs and OCR reaching near-human levels,
         | we can finally... attempt to infer structured data back out
         | from them. So many megawatt-hours wasted in undoing what was
         | just done. Structure to unstructure to structure again. Why,
         | why, why.
         | 
         | As for universality... I mean, sure, they're better than some
         | proprietary format that can only be decrypted or parsed by one
         | old rickety piece of software that has to run in Win95
         | compatibility mode. But they're not better than JSON or XML if
         | the source of truth is structured, and they're not better than
         | Markdown or - again - XML if the source is mostly text. And
         | there are always warts that aren't fully supported depending on
         | your viewer.
        
       | tdeck wrote:
       | GraphViz has some useful graph schema languages that could be
       | reused for something like this. There's DOT, a delightful DSL,
       | and some kind of JSON format as well. You can then generate a
       | bunch of different output formats and it will lay out the nodes
       | for you.
        
         | epistasis wrote:
         | Of all the challenges with this, graph layout is beyond
         | trivial. It does not rank as a problem, intellectual challenge,
         | or even that interesting.
         | 
         | The challenges are all about what goes in the nodes, how to
         | define it, how to standardize it across different institutions,
         | how to compare it to what was tested in two different clinical
         | trials, etc. And if the computerized process goes into clinical
         | practice, how is that node and its contents robustly defined so
         | that a clinician sitting with a patient can _instantly_
         | understand what is meant by it 's yes/no/multiple choice
         | question in terms that have been used in recent years at the
         | clinician's conferences.
         | 
         | Addressing the challenges of constructing the graph requires
         | deep understanding of the terms, deep knowledge of how 10
         | different people from different cultural backgrounds and
         | training locations interpret highly technical terms with
         | evolving meanings, and deep knowledge of how people could
         | misunderstand language or logic.
         | 
         | These guidelines codify evolving scientific knowledge where new
         | conceptions of the disease get invented at every conference.
         | It's all at the edge of science where every month and year we
         | have new technology to understand more than we ever understood
         | before, and we have new clinical trials that are testing new
         | hypotheses at the edge of it.
         | 
         | Getting a nice visual layout is necessary, but in no way
         | sufficient for what needs to be done to put this into practice.
        
           | graphviz wrote:
           | Not ... even that interesting?
        
             | graphviz wrote:
             | Modularity is an excellent way of attacking complex
             | problems. We can all play with algorithms that can carry on
             | realistic conversations and create synthetic 3D movies,
             | because people worked on problems like making transistors
             | the size of 10 atoms, figuring out how processors can
             | predict branches with 99% accuracy, giving neural nets
             | self-attention, deploying inexpensive and ridiculously fast
             | networks all over the planet, and a lot of other stuff.
             | 
             | For many of us, curing cancer may someday become more
             | important than almost anything else a computer can help us
             | to do. It's just there are so many building blocks to
             | solving truly complex problems; we must respect all that.
        
       | dogmatism wrote:
       | This is all predicated on the guidelines actually reflecting best
       | practices
        
       | epistasis wrote:
       | > With properly structured data, machines should be able to
       | interpret the guidelines. Charting systems could automatically
       | suggesting diagnostic tests for a patient. Alarm bells and "Are
       | you sure?" modals could pop up when a course of treatment
       | diverges from the guidelines. And when a doctor needs to review
       | the guidelines, there should be a much faster and more natural
       | way than finding PDFs
       | 
       | I have implemented this computerized process _twice_ at two
       | different startups over the past decade.
       | 
       | I would not want the NCCN to do it.
       | 
       | The NCCN guidelines are not stuck in PDFs, they are stuck in the
       | heads of doctors.
       | 
       | Once the NCCN guidelines get put into computerized rules, they
       | start to be guided _by_ those computerized rules, a second
       | influence that takes them away from the fundamental science.
       | 
       | So while I totally agree that there should be systemtticization
       | of the rules, it should be entirely secondary and subservient
       | _to_ the best frontier knowledge about cancer, which changes
       | _extremely_ frequently. Annually after every ASCO (major pan-
       | cancer conference) and every disease specific conference (e.g.
       | the San Antonio breast cancer conference), and occasionally
       | during the year when landmark clinical trials are published the
       | doctors need to update their knowledge from the latest trials and
       | their continuing medical education, which is entire body of
       | knowledge that is complementary to the edges of what the NCCN
       | publishes.
       | 
       | Having spanned both computer science and medicine for my entire
       | career, I trust doctors to be able to update their rules far
       | faster than the programmers and databases.
       | 
       | Please do not get the NCCN guidelines stuck in spaghetti code
       | that a few programmers understand, rather than open in PDFs with
       | lots of links that anybody can go and chase after.
       | 
       | Edit: though give me a week digesting this article and I may
       | change my mind. Maybe the NCCN should be standardizing clinical
       | variables enough such that the rules can trivially be turned into
       | rules. That would require that the hypotheses that a clinical
       | trial fits into those rules however, and that's why I need a week
       | of digestion to see if it may even be possible...
        
       | bsder wrote:
       | Gee, before talking about complex stuff like decision trees, how
       | about we start with something _really_ simple like _not requiring
       | a login to download the stupid PDF from NCCN_?
        
       | joshz404 wrote:
       | You might be interested in checking out the WHO SMART Guidelines.
       | Nothing on cancer yet AFAIK, but it's evolving.
        
         | rukshn wrote:
         | I was also thinking about FHIR and SMART guidelines.
         | 
         | But the whole system is mess. And the whole SMART guideline
         | system is controlled by 2-3 gatekeepers who don't listen to any
         | ideas other than their own
        
       | xh-dude wrote:
       | The author makes a great case for machine-interpretable standards
       | but there is an _enormous_ amount of work out there devoted to
       | this, it's been a topic of interest for decades. There's so much
       | in the field that a real problem is figuring out what solutions
       | match the requirements of the various stakeholders, more than
       | identifying the opportunities.
        
       | hulitu wrote:
       | > With properly structured data, machines should be able to
       | interpret the guidelines.
       | 
       | Yeah, right. And then say "Die". /s
       | 
       | The guidelines shall be structured properly. It is not rocket
       | science.
        
       | grumbel wrote:
       | Same reason why datasheets are still PDFs. It's a reliable, long
       | lasting and portable format. And while it's kind of ridiculous
       | that we are basically emulating paper, no other format fills that
       | niche.
       | 
       | It's the niche HTML should be able to fill, since that was its
       | original purpose, but isn't, since all focus over the last 20 or
       | so years has been on everything else, but making HTML a better
       | format for information exchange.
       | 
       | Trivial things like bundling up a complex HTML document into a
       | single file don't have standard solutions. Cookies stop working
       | when you are dealing with file:// URLs and a lot of other really
       | basic stuff just doesn't work or doesn't exist. Instead you get
       | offshot formats like ePUB that are mostly HTML, but not actually
       | supported by most browser.
        
       | schu wrote:
       | Would love to take a look at the code, in particular at how the
       | data extraction and transformation is implemented.
       | 
       | As a side note, the German associations of oncology publish their
       | guidelines here (HTML and SVG graphs):
       | https://www.onkopedia.com/de/onkopedia/guidelines
        
       | rmrfchik wrote:
       | Because writers don't think about readers. PDF is one of the
       | worst formats for science/technical info, but yet. I've dumped a
       | lot of papers from arxiv because it formatted as 2-column non
       | zoomable PDF.
        
       | troysk wrote:
       | I find the web(HTML/CSS) the most open format for sharing. PDFs
       | are hard to be consumed on smaller devices and much harder to be
       | read by machines. I am working on a feature at Jaunt.com to
       | convert PDFs to HTML. It shows up as reader mode icon. Please try
       | it out and see if it is good enough. I personally think we need
       | to do much better job. https://jaunt.com
        
         | ErigmolCt wrote:
         | PDFs can be notoriously difficult to work with on smaller
         | devices
        
       | breytex wrote:
       | Shouldn't the end goal be just to train an ai on all the pdfs and
       | give the doctors an interface to plug in all the details and get
       | a treatment plan generated by that ai?
       | 
       | Working on the data structure feels like an intermediate solution
       | on the way to that ai which is not really necessary. Or am I
       | missing something?
        
         | fl0id wrote:
         | Your end goal maybe. Not patients or doctors goal for sure.
        
         | pjc50 wrote:
         | How does your treatment AI get its liability insurance?
        
         | prmoustache wrote:
         | I am not sure patients and doctors are interested in adding
         | hallucination generators to the list of their problems.
        
         | ska wrote:
         | AI/ML techniques in medicine have been applied clinically since
         | at least the 90s. Part of the reason you don't see them used
         | ubiquitously is a combination of a) it hasn't worked all that
         | well in many scenarios so far and b) medicine is by nature
         | quite conservative, for a mix of good and not so good reasons.
        
       | whiterock wrote:
       | Why can this not just be a website? Isn't this a perfect use case
       | for HTML and hyperlinks?
        
       | mav3ri3k wrote:
       | Excellent read. This consolidated and catalyzed my my spurious
       | thoughts around personal information management. The input is
       | generally markdown/pdf but over time highly useless for a single
       | person. Thete would be value if it is passed through such a
       | system over time.
        
       | ramoz wrote:
       | Cool tool. From my experience the PDF was easy to traverse.
       | 
       | The hardest part for me was understanding that treatment options
       | could differ (i.e. between the _top_ hospitals treating the
       | cancer). And there were a few critical options to consider. NCCN
       | paths were traditional, but there is in between decisions to make
       | or alternative paths. ChatGPT was really helpful in that period.
       | "2nd" opinions are important... but again you ask the top 2
       | hospitals and they differ in opinion, any other hospital is
       | typically in one of those camps.
        
       | hashishen wrote:
       | Funny i just had the thought the other day about how we as a
       | society need to move past the pdf format or even just update it
       | to be editable in traditional document software. The fact that
       | Google docs will export as a pdf and not have it saved in the
       | documents is proof its gotten to a point of inefficiency and
       | that's just one example
        
       | easytigerm wrote:
       | The OP will be pleased to know that they're not the first person
       | to think of this idea. Searching for "computable clinical
       | guidelines" will unearth a wealth of academic literature on the
       | subject. A reasonable starting point would be this paper [1].
       | Indeed people have been trying since the 70s, most notably with
       | the famous MYCIN expert system. [2]
       | 
       | As people have alluded to and the history of MYCIN shows, there's
       | a lot more subtlety to the problem than appears on the surface,
       | with a whole bunch of technical, psychological, sociological and
       | economic factors interacting. This is why cancer guidelines are
       | stuck in PDFs.
       | 
       | Still, none of that should inhibit exploration. After all, just
       | because previous generations couldn't solve a problem doesn't
       | mean that it can't be solved.
       | 
       | [1] https://pmc.ncbi.nlm.nih.gov/articles/PMC10582221/
       | 
       | [2] https://www.forbes.com/sites/gilpress/2020/04/27/12-ai-
       | miles...
        
         | adolph wrote:
         | To the author:
         | 
         | The above is a high quality comment with worthy areas to study.
         | 
         | Additionally I would draw your attention to NCCN's "Developer
         | API" which is not interesting technologically but how it
         | reflects the IP landscape.
         | 
         | https://www.nccn.org/developer-api
        
       | fasa99 wrote:
       | WAIT ... Hole up... what have we here:
       | https://www.nccn.org/compendia-templates/compendia/nccn-comp...
       | 
       | TLDR: The NCCN surely has a clean pretty database of these
       | algorithms. They output these junky pdfs for free. Want cleaner
       | "templates" data? Pay the toll please.
       | 
       | What we have here is a walled garden. Want the treatment
       | algorithm? Here muck through this huge disaster of 999 page pdfs.
       | Oh you want the underlying data? Well, well, it's going to cost
       | you.
       | 
       | What we have here is not so much different than the paywalls of
       | an academic journal. Some company running a core service to an
       | altruistic industry and skimming a price. OP is just writing an
       | algorithm to unskim it. And nobody can really use it without
       | making the thing bulletproof lest a physician mistreat a cancer.
       | 
       | To my sentiment this is yet another unethical topic in
       | healthcare. These clunky algorithms, if a physician uses them,
       | slows the process and introduces a potential source of error,
       | ultimately harming patients. Harming patients for increased
       | revenue. The physicians writing and maintaining the guidelines
       | look the other way given they get a paycheck off it, plus the
       | prestige of it all, similar to some scenarios in medicine itself.
       | 
       | The natural thing to do is crack open the database and let
       | algorithms utilize it. This whole thing of dumping data in an
       | obstruse and machine-challenging format, then a rube goldberg
       | machine to reverse the transformation, it's not right.
       | 
       | Anyway I mention this because there seems to be a thought of
       | "these pdfs are messy lets clean them" without looking at what's
       | really going on here.
        
         | persona wrote:
         | OP is talking about the NCCN Guidelines, which doesn't seem to
         | be available in other formats or API. From their website:
         | 
         | NCCN Clinical Practice Guidelines in Oncology (NCCN
         | Guidelines(r)): The NCCN Guidelines(r) document evidence-based,
         | consensus-driven management to ensure that all patients receive
         | preventive, diagnostic, treatment, and supportive services that
         | are most likely to lead to optimal outcomes.
         | 
         | Format(s) Available for Licensing: PDF API not available
        
       | gcanyon wrote:
       | The real question is: why is _everything_ stuck in PDFs, and the
       | more important meta-question is: why don 't PDFs support meta-
       | data (they do, somewhat). So much of what we do is essentially
       | machine-to-machine, but trapped in a format designed entirely for
       | human-to-human (also lump in a bit of machine-to-human).
       | 
       | Adobe has had literally a third of a century to recognize this
       | need and address it. I don't think they're paying attention :-/
        
         | layer8 wrote:
         | PDFs can have arbitrary files embedded, like XML and JSON. It
         | also supports a logical structure tree (which doesn't need to
         | correspond to the visual structure) which can carry arbitrary
         | attributes (data) on its structure elements. And then there's
         | XML Forms. You can really have pretty much anything machine-
         | processable you want in a PDF. One could argue that it is _too_
         | flexible, because any design you can come up with that uses
         | those features for a particular application is unlikely to be
         | very interoperable.
        
         | queuebert wrote:
         | PDFs are essentially compressed Postscript, which is Turing
         | complete, so a PDF in theory can do anything you want.
        
       | gmueckl wrote:
       | Software that gives treatment instructions may be a medical
       | device requiring FDA approval. You may be breaking the law if you
       | give it to a medical professional without such approval.
        
       | gibsonf1 wrote:
       | The idea of adding hallucination to medical advice seems very
       | dangerous.
        
         | xh-dude wrote:
         | There's also a regression-to-the-mean problem, the systems
         | really shouldn't optimize just for the easier cases. I wonder
         | if that's a direct tradeoff, I think maybe it is with the kinds
         | of things I see used to tweak out hallucinations.
        
       | guipsp wrote:
       | I have to ask: did the author contact any medical professional
       | when writing this article? Is this really something that needs to
       | be fixed, and will his solution actually fix it?
       | 
       | It seems to me that ignoring the guideline is a physician
       | decision, and when it is ignored (for good or for bad), it is not
       | because the guidelines are not available in json.
        
       | queuebert wrote:
       | As a cancer researcher myself, I'd point out that some branches
       | of the decision trees in the NCCN guidelines are based on studies
       | in which multiple options were not statistically significantly
       | different, but all were better than the placebo. In those cases,
       | the clinician is free to use other factors to decide which arm to
       | take. A classic example of this is surgery vs radiation for
       | prostate cancer. Both are roughly equally effective, but very
       | different experiences.
        
       | awinter-py wrote:
       | a decision tree is just a csv trapped in amber. share the actual
       | data
        
       | anigbrowl wrote:
       | Why isn't all human knowledge in one big JSON file? Guidelines
       | are decision trees, but they're not written to be applied by
       | rote, because the identical patients with identical cancers
       | posited in the hypothetical _don 't exist_. The guidelines are
       | not written for maverick clinician movie protagonists to navigate
       | the decision tree in real time while racing against an
       | oncological clock, they're for teams of clinicians who are very
       | well trained and used to working with each other, _and_ who have
       | the skills to notice where the guidelines might be wrong or need
       | expansion or modification. That is, they 're abstractions of the
       | state-of-the-art.
       | 
       | Now it'd be nice if these could be treated like source code on
       | some sort of medical VCS, to be easily modded or even forked by
       | sub-specialists. But wetware dependencies are way more
       | complicated than silicon ones, and will remain harder to
       | discretize for some time to come.
       | 
       | It's not that the author's aspirations misguided, they're great.
       | But I believe progress in this area is most easily realized as
       | part of a relevant team, because what looks conceptually simple
       | from outside the system only seems that way because of a lack of
       | resolution.
        
       | amai wrote:
       | Why is anything stuck in PDFs?
       | 
       | PDFs are just good for just one thing: printing. Data stored as
       | PDF is meant to be printed and not being processed by any other
       | means.
        
       | inportb wrote:
       | Profit.
       | 
       | I use these guidelines all the time in the PDF format for free,
       | and I'd love to have these in a structured format. For $3000/year
       | you could get 50 users access to PDF prescription templates to
       | speed up their work. That's not bad, but it's still all PDF.
       | 
       | For the nice low price of "contact us for pricing," though, you
       | could have EMR integration. They couldn't justify $$$$ for EMR
       | integration if all this information is easily accessible.
       | 
       | https://www.nccn.org/compendia-templates/nccn-templates-main...
        
       | grovehaw wrote:
       | In the UK guidelines are published by the National Institute for
       | Clinical Excellence. Guidance is available to all in html and pdf
       | formats.[0]
       | 
       | [0] https://www.nice.org.uk/guidance
        
       | w10-1 wrote:
       | "At their core, guidelines are decision trees"
       | 
       | That's wishful and perhaps not even helpful as a goal. Guidelines
       | rarely have the data to cover all possible legs of decisions.
       | They report on well-supported findings, offer expert opinions on
       | some interpolated cases, and perhaps list factors to consider for
       | some of the remainder. If you reduced this to a decision tree,
       | you'd find many branches are not covered, and most experts could
       | identify factors that should lead to a more complex tree.
       | 
       | The reason is that branches are rarely definitive. It's more like
       | quantum probabilities: you have to hold them all at once, and
       | only when treatment works or doesn't does the disease (here
       | cancer) declare itself as such.
       | 
       | Until the true information architecture of guidelines is
       | captured, they will be conveyed as authoritative and educational
       | statements of the standard of care.
       | 
       | In almost all cases, it's more important to reduce latency and
       | increase transparency (i.e, publish faster but with references)
       | than to simplify or operationalize in order to improve uptake.
       | Most doctors in dynamic fields don't need the simplification;
       | they rely on life-long self-discipline and diligence to overcome
       | difficulty in the material, and use guidelines at most as a
       | framework for communication and completion, i.e., for knowing
       | when they're addressed known concerns.
       | 
       | Structured guidelines mainly enable outsiders to observe and
       | control in ways that are likely to be unproductive.
        
       ___________________________________________________________________
       (page generated 2024-12-24 23:00 UTC)