[HN Gopher] Why are cancer guidelines stuck in PDFs?
___________________________________________________________________
Why are cancer guidelines stuck in PDFs?
Author : huerne
Score : 249 points
Date : 2024-12-23 23:36 UTC (23 hours ago)
(HTM) web link (seangeiger.substack.com)
(TXT) w3m dump (seangeiger.substack.com)
| osmano807 wrote:
| I know it's not the same, but in many areas we have this "follow
| the arrows" system in many guidelines. For some examples, see the
| EULAR guidelines with it's fluxograms for treatments and also AO
| Surgery Reference with a graphical approach to select treatments
| based on fracture pattern, avaliable materials and skill set.
|
| I think that's a logical and necessary step to join medical
| reasoning and computer helpers, we need easier access to new
| information and more importantly to present clinical relevant
| facts from the literature in a way that helps actual patient care
| decision making.
|
| I'm just not too sure we can have generic approaches to all
| specialties, but it's nice seeing efforts in this area.
| pcrh wrote:
| The fundamental idea here is that doctors find it difficult to
| ensure that their recommendations are actually up-to-date with
| the latest clinical research.
|
| Further, that by virtue of being at the centre of action in
| research, doctors in prestige medical centres have an advantage
| that _could_ be available to all doctors. It 's a pretty
| important point, sometimes referred to as the dissemination of
| knowledge problem.
|
| Currently, this is best approached by publishing systematic
| reviews according to the Cochrane Criteria [0]. Such reviews are
| quite labour-intensive and done all too rarely, but are very
| valuable when done.
|
| One aspect of such reviews, when done, is how often they discard
| published studies for reasons such as bias, incomplete datasets,
| and so forth.
|
| The approach described by Geiger in the link is commendable for
| its intentions but the outcome will be faced with the same
| problem that manual systematic reviews face.
|
| I wonder if the author considered included rules-based approaches
| (e.g. Cochrane guidelines) in addition to machine learning
| approaches?
|
| [0] https://training.cochrane.org/handbook
| slaucon wrote:
| Hey author here--Cochrane reviews are great.
|
| NCCN guidelines and Cochrane Reviews serve complementary roles
| in medicine - NCCN provides practical, frequently updated
| cancer treatment algorithms based on both research and expert
| consensus, while Cochrane Reviews offer rigorous systematic
| analyses of research evidence across all medical fields with a
| stronger focus on randomized controlled trials. The NCCN
| guidelines tend to be more immediately applicable in clinical
| practice, while Cochrane Reviews provide a deeper analysis of
| the underlying evidence quality.
|
| My main goal here was to show what you could do with any set of
| medical guidelines that was properly structured. You can choose
| any criteria you want.
| liontwist wrote:
| > doctors find it difficult to ensure that their
| recommendations are actually up-to-date with the latest
| clinical research
|
| Doctors care about as much this as software engineers care
| about the latest computer science research. A few curious ones
| do. But the general attitude is they already did tough years of
| school so they don't have to anymore.
| refurb wrote:
| I worked with oncologists and this isn't true.
|
| Oncology has a rapidly changing treatment landscape and it's
| common for oncologists to be discussing the latest paper that
| has come out.
|
| If you're an oncologist and not keeping up with the
| literature you're going to be out of date in your decisions
| in about 6 months from graduation.
| liontwist wrote:
| Funny enough that last paragraph is also said of software
| engineers too. Neither are true.
| mort96 wrote:
| Yeah, non-programmers seem to think everything is
| changing so quickly all the time yet here I am writing in
| a 40 year old language against UNIX APIs from the 70s
| -\\_(tsu)_/-
| resource_waste wrote:
| It amazes me that AI isnt a borderline requirement for being a
| doctor. Think of how much info is outdated or just wrong.
| prepend wrote:
| I'd rather have the pdf than a custom tool. Especially
| considering the tool will be unique to the practice or emr. And
| likely expensive to maintain.
|
| PDFs suck in many ways but are durable and portable. If I work
| with two oncologists, I use the same pdf.
|
| The author means well but his solution will likely be worse
| because only he will understand it. And there's a million edge
| cases.
| akoboldfrying wrote:
| The author is proposing that the DAG representation be _in
| addition to_ the PDF:
|
| >The organizations drafting guidelines should release them in
| structured, machine-interpretable formats in addition to the
| downloadable PDFs.
|
| My opinion: Ideally the PDF could be _generated from_ the
| underlying DAG -- that would give you confidence that
| everything in the PDF has been captured in the DAG.
| maxerickson wrote:
| You could generate the document from the graph and then
| attach it as data.
| JumpCrisscross wrote:
| > _could generate the document from the graph and then
| attach it as data_
|
| Much easier for doctors to draft PDFs than graphs.
| seb1204 wrote:
| I have not drafted a PDF myself and I doubt doctors will.
| They work in a text writer or spreadsheet application and
| then export or print to PDF would be my guess. An
| interactive interface could spit out the PDF with the
| decision three in the end. This solution would still mean
| the decision tree source is in some software package.
| crazygringo wrote:
| Exactly. The PDF's _work_. They won 't break. You can see all
| the information with your own eyes. You can send them by
| e-mail.
|
| A wizard-type system hides most of the information from you, it
| might have bugs you aren't aware of, if you want to glance at
| an alternative path you can't, it's going to be locked into
| registered users, the system can go down.
|
| I think much more intelligent computer systems are the future
| in health care, but I doubt the way to start is with yet
| another custom tool designed specifically for cancer guidelines
| and nothing else.
| crabmusket wrote:
| > it's going to be locked into registered users, the system
| can go down
|
| I didn't see anything in the screenshots presented that
| wouldn't be doable in a single HTML file containing the data,
| styles and scripts?
|
| This is a countercultural idea but it fits so many use cases;
| it's a tragedy we don't do this more often. The two options
| are either PDF or SaaS.
| ajsnigrutin wrote:
| > The PDF's work. They won't break.
|
| Not just that, PDFs are one of the few formats, where i'm
| willing to bet my own money, that they'll still work in 10 or
| 20 years.
|
| Even basic html has changed, layouts look different depending
| on many factors, and even the <blink>-ing doesn't work
| anymore.
| seb1204 wrote:
| A case specific PDF could be created and stored in the
| patient's electronic records. Such PDF could just highlight
| the decision three path.
| inferiorhuman wrote:
| Sure, PDF/A is an ISO-standardized subset of the larger PDF
| spec designed expressly for archival purposes. You could do
| that with HTML but then how would you get your crypto
| mining AI chat bot powered by WASM to work?
| slaucon wrote:
| Hey author here! Appreciate the feedback! Agreed on importance
| of portability and durability.
|
| I'm not trying to build this out or sell it as a tool to
| providers. Just wanted to demo what you could do with
| structured guidelines. I don't think there's any reason this
| would have to be unique to a practice or emr.
|
| As sister comments mentioned, I think the ideal case here would
| be if the guideline institutions released the structured
| representations of the guidelines along with the PDF versions.
| They could use a tool to draft them that could export in both
| formats. Oncologists could use the PDFs still, and systems
| could lean into the structured data.
| Dalewyn wrote:
| >Agreed on importance of portability and durability.
|
| I think "importance" is understating it, because permanent
| consistency is practically the only reason we all (still) use
| PDFs in quite literally every professional environment as a
| lowest common denominator industrial standard.
|
| PDFs will always render the same, whether on paper or a
| screen of any size connected to a computer of any
| configuration. PDFs will almost always open and work given
| Adobe Reader, which these days is simply embedded in Chrome.
|
| PDFs will almost certainly Just Work(tm), and Just
| Working(tm) is a god damn virtue in the professional world
| because time is money and nobody wants to be embarrassed
| handing out unusable documents.
| abtinf wrote:
| PDFs generally will look close enough to the original
| intent that they will almost always be usable, but will not
| always render the same. If nothing else, there are
| seemingly endless font issues.
| lstamour wrote:
| In this day and age that seems increasingly like a solved
| problem to most end users, often a client-side issue or
| using a very old method of generating a PDF?
|
| Modern PDF supports font embedding of various kinds
| (legality is left as an exercise to the PDF author) and
| supports 14 standard font faces which can be specified
| for compatibility, though more often document authors
| probably assume a system font is available or embed one.
|
| There are still problems with the format as it foremost
| focuses on document display rather than document
| structure or intent, and accessibility support in
| documents is often rare to non-existent outside of
| government use cases or maybe Word and the like.
|
| A lot of usability improvements come from clients that
| make an attempt to parse the PDF to make the format
| appear smarter. macOS Preview can figure out where
| columns begin and end for natural text selection, Acrobat
| routinely generates an accessible version of a document
| after opening it, including some table detection.
| Honestly creative interpretation of PDF documents is
| possibly one of the best use cases of AI that I've ever
| heard of.
|
| While a lot about PDF has changed over the years the
| basic standard was created to optimize for printing. It's
| as if we started with GIF and added support to build
| interactive websites from GIFs. At its core, a PDF is
| just a representation of shapes on a page, and we added
| metadata that would hopefully identify glyphs, accessible
| alternative content, and smarter text/line selection, but
| it can fall apart if the PDF author is careless,
| malicious or didn't expect certain content. It probably
| inherits all the weirdness of Unicode and then some, for
| example.
| seb1204 wrote:
| I would assume these decision tree PDF use a commonly
| available font. Layout and interpreted outcomes should be
| the same.
| killjoywashere wrote:
| The cancer reporting protocols from the College of American
| Pathologists are available in structured format (1). No major
| laboratory information system vendor properly implements
| them, properly, and their implementation errors cause some
| not-insignificant problems with patient care (oncologists
| calling the lab asking for clarification, etc). This has
| pushed labs to make policies disallowing the use of those
| modules and individual pathologists reverting to their own
| non-portable templates in Word documents.
|
| The medical information systems vendors are right up there
| with health insurance companies in terms of their investment
| in ensuring patient deaths. Ensuring. With an E.
|
| (1) https://www.cap.org/protocols-and-guidelines/electronic-
| canc...
| all2 wrote:
| > The medical information systems vendors are right up
| there with health insurance companies in terms of their
| investment in ensuring patient deaths. Ensuring. With an E.
|
| Can you expand on this?
| righthand wrote:
| Medical information system vendors only care about making
| a profit, not implementing actual solutions. The
| discrepancies between systems can lead to bad information
| which can cost people their life.
| ethbr1 wrote:
| As an analogy, imagine if the consequence of Oracle doing
| Oracle-as-usual things was worse medical outcomes. But
| they did them anyway for profit.
|
| That's basically medical information system vendors.
|
| The fact that the US hasn't pushed open source EMRs
| through CMS is insane. It's literally the perfect problem
| for an open solution.
| caboteria wrote:
| It's worse than that. VistA is a world-class open source
| EMR that the VA has been trying to kill for decades.
| PoignardAzur wrote:
| I mean, you're attributing malice, but it could just be
| that reliably implementing the formats is a really really
| hard problem?
| TheAceOfHearts wrote:
| How about fixing the format? Something that is obviously
| broken and resulting in patient deaths should really be
| considered a top priority. It's either malice or masskve
| incompetence. If these protocols were open there would
| definitely be volunteers willing to help fix it.
| mort96 wrote:
| Incompetence at this level is intentional, it means
| someone doesn't think they'll see RoI from investing
| resources into improving it. Calling it malice is
| appropriate I feel.
| layer8 wrote:
| If there is no ROI, investing further resources would be
| charity work. I don't think it's accurate to call a
| company not doing so malicious.
| WitCanStain wrote:
| Not actively malicious perhaps, but prioritising profits
| over lives is evil. Either you take care to make sure the
| systems you sell lead to the best possible outcomes, or
| you get out of the sector.
| layer8 wrote:
| The company not existing at all might be worse though? I
| think it's too easy to make blanket judgments like that
| from the outside, and it would be the job of regulation
| to counteract adverse incentives in the field.
| PoignardAzur wrote:
| You seem to think that the default assumption is that
| fixing the format is easy/feasible, and I don't see why.
| Do you have domain knowledge pointing that way?
|
| It's a truism in machine learning that curating and
| massaging your dataset is the most labor-intensive and
| error-prone part of any project. I don't why that would
| stop being true in healthcare just because lives are on
| the line.
| prepend wrote:
| I think there are more options than malice or
| incompetence. My theory is difficulty.
|
| There's multiple countries with socialized medicine and
| no profit motive and it's still not solved.
|
| I think it's just really complex with high negative
| consequences from a mistake. It takes lots of investment
| with good coordination to solve and there's an "easy
| workaround" with pdfs that distributes liability to
| practitioners.
| ethbr1 wrote:
| Healthcare suffers from strict regulatory requirements,
| underinvestment in organic IT capabilities, and huge
| integration challenges (system-to-system).
|
| Layering any sort of data standard into that environment
| (and evolving it in a timely manner!) is nigh impossible
| without an external impetus forcing action (read:
| government payer mandate).
| jjmarr wrote:
| It doesn't look like the XML data is freely accessible.
|
| If I could get access to this data as a random student on
| the internet, I'd love to create an open source tool that
| generates an interactive visualization.
| zo1 wrote:
| People could potentially properly implement them if they
| were open and available:
|
| "Contact the CAP for more information about licensing and
| using the CAP electronic Cancer Protocols for cancer
| reporting at your institution."
|
| This stinks of the same gate-keeping that places like NIST
| and ISO do, charging you for access to their "standards".
| fl0id wrote:
| For liability reasons alone, you cannot just have random
| people working on health/lab stuff and the requisite
| vendors have access to these standards.
| joshuaissac wrote:
| According to what killjoywashere said, the vendors do not
| want to implement these standards. So if CAP wants the
| standards to be relevant, they should release them for
| random people to implement.
| prepend wrote:
| Aren't all NIST standards free as they are a government
| body?
| prepend wrote:
| I believe you have good intentions, but someone would need to
| build it out and sell it. And it requires lots of
| maintenance. It's too boring for an open source community.
|
| There's a whole industry that attempts to do what you do and
| there's a reason why protocols keep getting punted back to
| pdf.
|
| I agree it would be great to release structured
| representations. But I don't think there's a standard for
| that representation, so it's kind of tricky as who will
| develop and maintain the data standard.
|
| I worked on a decision support protocol for Ebola and it was
| really hard to get code sets released in Excel. Not to
| mention the actual decision gates in a way that is
| computable.
|
| I hope we make progress on this, but I think the incentives
| are off for the work to make the data structures necessary.
| KPGv2 wrote:
| You say this, but on the other hand, the author alleges that
| the places that use these custom tools achieve better outcomes.
| You didn't address this point one way or the other.
|
| Do you think this is a completely fabricated non-explanation?
| It's not like the link says "the worst places use these custom
| tools."
| ahardison wrote:
| Totally valid concerns. If you have time, I would like to show
| you my solution to get your thoughts as I believe I have found
| ways to mitigate all of your concerns. Currently I am using
| STCC (Schmitt-Thompson Clinical Content). I Have sent you some
| of the PDF's we use for testing.
| zahlman wrote:
| It would, I imagine, be much easier to generate a PDF from the
| tool's internal flowchart representation than the other way
| around.
| Spooky23 wrote:
| I think there's value if it can scale down.
|
| Community oncologists have limited technology resources as
| compared to a national cancer center. If we can make their
| lives easier, it can only be a good thing.
|
| That said, I like published documents like PDFs - systems
| usually make it hard to conii ok are the June release from the
| September release.
| layer8 wrote:
| I agree. However, since the PDF format supports structured
| data, one could in principle have it both ways, within a single
| file.
| queuebert wrote:
| ^ This. See, e.g., https://lab6.com/ for some interesting
| tricks with the PDF format.
| upghost wrote:
| It's so much worse than you could possibly imagine. I worked for
| a healthcare startup working on patient enrollment for clinical
| oncology trials. The challenges are amazing. Quite frankly it
| wouldn't matter if the data were in plaintext. The diagnostic
| codes vary between providers, the semantic understanding of the
| diagnostic information has different meanings between providers,
| electronic health records are a mess, things are written entirely
| in natural language rather than some kind of data structure.
| Anyone who's worked in healthcare software can tell you way more
| horror stories.
|
| I do hope that LLMs can help straighten some of it out but anyone
| whos done healthcare software, the problems are not technical,
| they are quite human.
|
| That being said one bright spot is we've (my colleagues, not me)
| made a huge step forward using category theory and Prolog to
| discover the provably optimal 3+3 clinical oncology dose
| escalation trial protocol[1]. David gave a great presentation on
| it at the Scryer Prolog meetup[2] in Vienna.
|
| It's kind of amazing how in the dark ages we are with medicine.
| Even though this is the first EXECUTABLE/PROGRAMMABLE SPEC for a
| 3+3 cancer trial, he is still fighting to convince his medical
| colleagues and hospital administrators that this is the optimal
| trial because -- surprise -- they don't speak software (or
| statistics).
|
| [1]: https://arxiv.org/abs/2402.08334
|
| [2]: https://www.digitalaustria.gv.at/eng/insights/Digital-
| Austri...
| slaucon wrote:
| This is a fascinating idea!
| sebmellen wrote:
| Have you read Jake Seliger's pieces on oncology clinical trials
| https://jakeseliger.com/.
| upghost wrote:
| Oh wow. No, that's heart breaking. I'll have to read up on
| this. Reminds me of David explaining the interesting and
| somewhat surprisingly insensitive language the oncology
| literature uses towards folks going through this. Its there
| for historical reasons but slow to change.
|
| It also shows how important getting dose escalation trials
| are. The whole point is finding the balance point where "cure
| is NOT worse than the disease". A bad dose can be worse than
| the cancer itself, and conducting the trials correctly is
| extremely important... and this really underscores the human
| cost. Truly heartbreaking :(
| londons_explore wrote:
| Decision trees work for making decisions...
|
| But they don't work as well as other decisionmaking techniques...
| Random forests, linear models, neural nets, etc. are all decision
| making techniques at their core.
|
| And decision trees perform poorly for complex systems where lots
| of data exists - ie. human health.
|
| So why are we using a known-inferior technique simply because
| it's easier to write down in a PDF file, reason about in a
| meeting, or explain to someone?
|
| Shouldn't we be using the most advanced mathematical models
| possible with the highest 'cure' probability, even if they're so
| complex no human can understand them?
| wizzwizz4 wrote:
| Models too complex for humans to understand don't, in practice,
| have a high 'cure' probability.
| s1artibartfast wrote:
| Dinner generation is usually based on decision tree models as
| well, so they match the resolution of the available data.
|
| The practice of real world medicine often interpolates between
| these data points.
| epcoa wrote:
| > complex systems where lots of data exists
|
| Not a lot of high quality data exists for human health.
| Clinical guidelines for many diseases are built around
| surprisingly scant evidence many times.
|
| > even if they're so complex no human can understand them?
|
| That'll be wonderful to explain in court when they figure out
| it was just data smuggling or whatever other bias.
| epistasis wrote:
| In cancer there's an abundance of clinical trials with high
| quality data, but it is all very _complex_ in terms of
| encoding what the clinical trial actually encoded.
|
| Go to a clinical cancer conference and you will see the grim
| reality of 10,000s of people contributing to the knowledge
| discovery process with their cancer care. There is an inverse
| relationship between the number of people in a trial and the
| amount of risk that goes into that trial, but it is still a
| massive amount of data that needs to be codified into some
| sensible system, and it's hard enough for a person to do it.
|
| > That'll be wonderful to explain in court when they figure
| out it was just data smuggling or whatever other bias.
|
| What do you mean by this? I'm not aware of any data smuggling
| that has ever happened in a clinical trial. The "bias" is
| that any research hypothesis comes from the fundamentally
| biased position of "I think the data is telling me this" but
| I've seen very little bias of truly bad hypotheses in cancer
| research like those that have dominated, say Alzheimer's
| research. Any research malfeasance should be prosecuted to
| the fullest, but I don't think cancer research has much of
| it. This was a huge scandal, but I don't think it pointed to
| much in the way of bad research in the end:
|
| https://www.propublica.org/article/doctor-jose-baselga-
| cance...
| epcoa wrote:
| By smuggling and bias I meant in an ML model. Smuggling was
| a bit informal, but referring to models overfit on
| unintended features or artifacts.
| londons_explore wrote:
| but we have well established ways to deal with those...
| test/validation sets, n-fold validation, etc.
|
| Even if there was some overfitting or data contamination
| that was undetected, the result would most probably still
| be better than a hand-made decision tree over the same
| data...
| wizzwizz4 wrote:
| Hand-made decision trees are open to inspection,
| comprehension, and adaption. There is no way to adapt an
| opaque ML model to new findings / an experimental
| treatment except by producing a new model.
| epcoa wrote:
| Ok, until you can sue the AI you need to find a doctor ok
| putting their license behind saying "I have no idea how
| this shiny thing works". There are indeed some that will,
| but not a consensus.
| a1o wrote:
| I parsed some mind maps that were constructed with a tool and
| exported as pdfs (original sources were lost a long time ago) and
| I used python with tesseract for the text and opencv and it
| worked alright. I am curious why the author went with LLMs, but I
| guess with the mentioned amount of data it wasn't hard to recheck
| everything later.
| inopinatus wrote:
| > The whole set of guidelines for a type of cancer breaks down
| into a few disjointed directed graphs
|
| Nothing undermines medicine quite so thoroughly as yet another
| astronaut trying to force it into a data structure.
| prepend wrote:
| Comically, I worked in this space and initially tried to get
| decision support working with data structures and code sets and
| such.
|
| I ended up only really contributed adding version numbers to
| the pdf. So at least people knew they had the latest and same
| versions. And that took a year, to get versions added to
| guideline pdfs.
| johnisgood wrote:
| That is wild, one would think versioning is extremely
| important. They tend to just put the timestamp in the
| filename (sometimes), which I guess is better than nothing.
|
| Don't signed PDFs include a timestamp, however?
| prepend wrote:
| Getting in the file name was kind of easy. But I meant
| adding it visually in the pdf guidance so readers could
| tell. Just numbers in the lower left corner. Or maybe
| right.
|
| The guideline was available via url so the filename
| couldn't change.
| LorenPechtel wrote:
| The real problem is that the guidelines are written for humans in
| the first place. Workarounds like this shouldn't be needed, to go
| from a machine friendly layout to a human friendly one is usually
| quite easy.
|
| And from what he says a decision tree isn't really the right
| model in the first place. What about no tree, just a heap of
| records in a SQL database. You do a query on the known
| parameters, if the response comes back with only one item in the
| treatment column you follow it. If it comes back with multiple
| items you look at what would be needed to distinguish them and do
| the test(s).
| noonanibus wrote:
| Forgive me if I'm mistaken, but isn't this exactly what the FHIR
| standard is meant to address? Not only does it enable global
| inter-health communication using a standardized resource, but
| it's already adopted in several national health services,
| including (but not broadly), America. Is this not simply a
| reimplementation, but without the broad iterations of HL7?
| nradov wrote:
| Right, it would make more sense to use HL7 FHIR (possibly along
| with CQL) as a starting point instead of reinventing the wheel.
| Talk to the CodeX accelerator about writing an Implementation
| Guide in this area. The PlanDefinition resource type should be
| a good fit for modeling cancer guidelines.
|
| https://codex.hl7.org/
|
| https://www.hl7.org/fhir/plandefinition.html
| joshuakelly wrote:
| This is the comment I was looking for.
|
| You would aim to use CQL expressions inside of a
| PlanDefinition, in my estimate. This is exactly what AHRQ's,
| part of HHS, CDS Connect project aims to create / has
| created. They publish freely accessible computable decision
| support artifacts here:
| https://cds.ahrq.gov/cdsconnect/repository
|
| When they are fully computable, they are FHIR PlanDefinitions
| (+ other resources like Questionnaire, etc) and CQL.
|
| Here's an example of a fully executable Alcohol Use Disorder
| Identification Test:
| https://cds.ahrq.gov/cdsconnect/artifact/alcohol-
| screening-u...
|
| There's so much other infrastructure around the EHR here to
| understand (and take advantage of). I think there's a big
| opportunity in proving that multimodal LLM can reliably
| generate these artifacts from other sources. It's not the LLM
| actually being a decision support tool itself (though that
| may well be promising), but rather the ability to generate
| standardized CDS artifacts in a highly scalable, repeatable
| way.
|
| Happy to talk to anyone about any of these ideas - I started
| exactly where OP was.
| osmano807 wrote:
| I downloaded and opened an CDS for osteoporosis from the
| link (as a disease in my specialty), I need an API key to
| view what a "valueset" entails, so in practice I couldn't
| assert if the recommendation aligns with clinical practice,
| nor in the CQL provided have any scientific references
| (even a textbook or a weak recommendation from a guideline
| would be sufficient, I don't think the algorithm should be
| the primary source of the knowledge)
|
| I tried to see if HL7 was approachable for small teams, I
| personally became exhausted from reading it and trying to
| think how to implement a subset of it, I know it's
| "standard" but all this is kinda unapproachable.
| nradov wrote:
| You can register for a free NLM account to access the
| value sets (VSAC). HL7 standards are approachable for
| small teams but due to the inherent complexity of
| healthcare it can take a while to get up to speed. The
| FHIR Fundamentals training course is a good option for
| those who are starting out.
|
| https://www.hl7.org/training/fhir-
| fundamentals.cfm?ref=nav
|
| It might seem tempting to avoid the complexity of FHIR
| and CQL by inventing your own simple schema or data
| formats for a narrow domain. But I guarantee that what
| you thought was simple will eventually grow and grow
| until you find that you've reinvented FHIR -- badly. I've
| seen that happen over and over in other failed projects.
| Talk to the CodeX accelerator I linked above and they
| should be able to get you pointed in the right direction.
| jdlyga wrote:
| PDFs are a universal, machine readable format.
| sswatson wrote:
| They're only machine-readable in the very weak sense that all
| computer files are machine-readable.
| GeneralMayhem wrote:
| PDFs are the opposite of machine-readable if you want to do
| anything other than render them as images on paper or a screen.
| They're only slightly more machine-readable than binary
| executables.
|
| I hate, hate, hate, hate, _hate_ the practice of using PDFs as
| a system of record. They are intended to be a print format for
| ensuring consistent typesetting and formatting. For that, I
| have no quarrel. But so much of the world economy is based on
| taking text, docx (XML), spreadsheets, or even _CSV_ files,
| rendering them out as PDFs, and then emailing them around or
| storing them in databases. They 've gone from being simply a
| view layer to infecting the model layer.
|
| PDFs are a step better than passing around screenshots of text
| as images - when they don't literally consist of a single
| image, that is. But even for reasonably-well-behaved, mostly-
| text PDFs, finding things like "headers" and "sections" in the
| average case is dependent on a huge pile of heuristics about
| spacing and font size conventions. None of that semantic
| structure exists, it's just individual characters with X-Y
| coordinates. (My favorite thing to do with people starting to
| work with PDFs is to tell them that the files don't usually
| contain any whitespace characters, and then watch the horror
| slowly dawn as they contemplate the implications.) (And yes, I
| know that PDF/A theoretically exists, but it's not reliably
| used, and certainly won't exist on any file produced more than
| a couple years ago.)
|
| Now, with multi-modal LLMs and OCR reaching near-human levels,
| we can finally... attempt to infer structured data back out
| from them. So many megawatt-hours wasted in undoing what was
| just done. Structure to unstructure to structure again. Why,
| why, why.
|
| As for universality... I mean, sure, they're better than some
| proprietary format that can only be decrypted or parsed by one
| old rickety piece of software that has to run in Win95
| compatibility mode. But they're not better than JSON or XML if
| the source of truth is structured, and they're not better than
| Markdown or - again - XML if the source is mostly text. And
| there are always warts that aren't fully supported depending on
| your viewer.
| tdeck wrote:
| GraphViz has some useful graph schema languages that could be
| reused for something like this. There's DOT, a delightful DSL,
| and some kind of JSON format as well. You can then generate a
| bunch of different output formats and it will lay out the nodes
| for you.
| epistasis wrote:
| Of all the challenges with this, graph layout is beyond
| trivial. It does not rank as a problem, intellectual challenge,
| or even that interesting.
|
| The challenges are all about what goes in the nodes, how to
| define it, how to standardize it across different institutions,
| how to compare it to what was tested in two different clinical
| trials, etc. And if the computerized process goes into clinical
| practice, how is that node and its contents robustly defined so
| that a clinician sitting with a patient can _instantly_
| understand what is meant by it 's yes/no/multiple choice
| question in terms that have been used in recent years at the
| clinician's conferences.
|
| Addressing the challenges of constructing the graph requires
| deep understanding of the terms, deep knowledge of how 10
| different people from different cultural backgrounds and
| training locations interpret highly technical terms with
| evolving meanings, and deep knowledge of how people could
| misunderstand language or logic.
|
| These guidelines codify evolving scientific knowledge where new
| conceptions of the disease get invented at every conference.
| It's all at the edge of science where every month and year we
| have new technology to understand more than we ever understood
| before, and we have new clinical trials that are testing new
| hypotheses at the edge of it.
|
| Getting a nice visual layout is necessary, but in no way
| sufficient for what needs to be done to put this into practice.
| graphviz wrote:
| Not ... even that interesting?
| graphviz wrote:
| Modularity is an excellent way of attacking complex
| problems. We can all play with algorithms that can carry on
| realistic conversations and create synthetic 3D movies,
| because people worked on problems like making transistors
| the size of 10 atoms, figuring out how processors can
| predict branches with 99% accuracy, giving neural nets
| self-attention, deploying inexpensive and ridiculously fast
| networks all over the planet, and a lot of other stuff.
|
| For many of us, curing cancer may someday become more
| important than almost anything else a computer can help us
| to do. It's just there are so many building blocks to
| solving truly complex problems; we must respect all that.
| dogmatism wrote:
| This is all predicated on the guidelines actually reflecting best
| practices
| epistasis wrote:
| > With properly structured data, machines should be able to
| interpret the guidelines. Charting systems could automatically
| suggesting diagnostic tests for a patient. Alarm bells and "Are
| you sure?" modals could pop up when a course of treatment
| diverges from the guidelines. And when a doctor needs to review
| the guidelines, there should be a much faster and more natural
| way than finding PDFs
|
| I have implemented this computerized process _twice_ at two
| different startups over the past decade.
|
| I would not want the NCCN to do it.
|
| The NCCN guidelines are not stuck in PDFs, they are stuck in the
| heads of doctors.
|
| Once the NCCN guidelines get put into computerized rules, they
| start to be guided _by_ those computerized rules, a second
| influence that takes them away from the fundamental science.
|
| So while I totally agree that there should be systemtticization
| of the rules, it should be entirely secondary and subservient
| _to_ the best frontier knowledge about cancer, which changes
| _extremely_ frequently. Annually after every ASCO (major pan-
| cancer conference) and every disease specific conference (e.g.
| the San Antonio breast cancer conference), and occasionally
| during the year when landmark clinical trials are published the
| doctors need to update their knowledge from the latest trials and
| their continuing medical education, which is entire body of
| knowledge that is complementary to the edges of what the NCCN
| publishes.
|
| Having spanned both computer science and medicine for my entire
| career, I trust doctors to be able to update their rules far
| faster than the programmers and databases.
|
| Please do not get the NCCN guidelines stuck in spaghetti code
| that a few programmers understand, rather than open in PDFs with
| lots of links that anybody can go and chase after.
|
| Edit: though give me a week digesting this article and I may
| change my mind. Maybe the NCCN should be standardizing clinical
| variables enough such that the rules can trivially be turned into
| rules. That would require that the hypotheses that a clinical
| trial fits into those rules however, and that's why I need a week
| of digestion to see if it may even be possible...
| bsder wrote:
| Gee, before talking about complex stuff like decision trees, how
| about we start with something _really_ simple like _not requiring
| a login to download the stupid PDF from NCCN_?
| joshz404 wrote:
| You might be interested in checking out the WHO SMART Guidelines.
| Nothing on cancer yet AFAIK, but it's evolving.
| rukshn wrote:
| I was also thinking about FHIR and SMART guidelines.
|
| But the whole system is mess. And the whole SMART guideline
| system is controlled by 2-3 gatekeepers who don't listen to any
| ideas other than their own
| xh-dude wrote:
| The author makes a great case for machine-interpretable standards
| but there is an _enormous_ amount of work out there devoted to
| this, it's been a topic of interest for decades. There's so much
| in the field that a real problem is figuring out what solutions
| match the requirements of the various stakeholders, more than
| identifying the opportunities.
| hulitu wrote:
| > With properly structured data, machines should be able to
| interpret the guidelines.
|
| Yeah, right. And then say "Die". /s
|
| The guidelines shall be structured properly. It is not rocket
| science.
| grumbel wrote:
| Same reason why datasheets are still PDFs. It's a reliable, long
| lasting and portable format. And while it's kind of ridiculous
| that we are basically emulating paper, no other format fills that
| niche.
|
| It's the niche HTML should be able to fill, since that was its
| original purpose, but isn't, since all focus over the last 20 or
| so years has been on everything else, but making HTML a better
| format for information exchange.
|
| Trivial things like bundling up a complex HTML document into a
| single file don't have standard solutions. Cookies stop working
| when you are dealing with file:// URLs and a lot of other really
| basic stuff just doesn't work or doesn't exist. Instead you get
| offshot formats like ePUB that are mostly HTML, but not actually
| supported by most browser.
| schu wrote:
| Would love to take a look at the code, in particular at how the
| data extraction and transformation is implemented.
|
| As a side note, the German associations of oncology publish their
| guidelines here (HTML and SVG graphs):
| https://www.onkopedia.com/de/onkopedia/guidelines
| rmrfchik wrote:
| Because writers don't think about readers. PDF is one of the
| worst formats for science/technical info, but yet. I've dumped a
| lot of papers from arxiv because it formatted as 2-column non
| zoomable PDF.
| troysk wrote:
| I find the web(HTML/CSS) the most open format for sharing. PDFs
| are hard to be consumed on smaller devices and much harder to be
| read by machines. I am working on a feature at Jaunt.com to
| convert PDFs to HTML. It shows up as reader mode icon. Please try
| it out and see if it is good enough. I personally think we need
| to do much better job. https://jaunt.com
| ErigmolCt wrote:
| PDFs can be notoriously difficult to work with on smaller
| devices
| breytex wrote:
| Shouldn't the end goal be just to train an ai on all the pdfs and
| give the doctors an interface to plug in all the details and get
| a treatment plan generated by that ai?
|
| Working on the data structure feels like an intermediate solution
| on the way to that ai which is not really necessary. Or am I
| missing something?
| fl0id wrote:
| Your end goal maybe. Not patients or doctors goal for sure.
| pjc50 wrote:
| How does your treatment AI get its liability insurance?
| prmoustache wrote:
| I am not sure patients and doctors are interested in adding
| hallucination generators to the list of their problems.
| ska wrote:
| AI/ML techniques in medicine have been applied clinically since
| at least the 90s. Part of the reason you don't see them used
| ubiquitously is a combination of a) it hasn't worked all that
| well in many scenarios so far and b) medicine is by nature
| quite conservative, for a mix of good and not so good reasons.
| whiterock wrote:
| Why can this not just be a website? Isn't this a perfect use case
| for HTML and hyperlinks?
| mav3ri3k wrote:
| Excellent read. This consolidated and catalyzed my my spurious
| thoughts around personal information management. The input is
| generally markdown/pdf but over time highly useless for a single
| person. Thete would be value if it is passed through such a
| system over time.
| ramoz wrote:
| Cool tool. From my experience the PDF was easy to traverse.
|
| The hardest part for me was understanding that treatment options
| could differ (i.e. between the _top_ hospitals treating the
| cancer). And there were a few critical options to consider. NCCN
| paths were traditional, but there is in between decisions to make
| or alternative paths. ChatGPT was really helpful in that period.
| "2nd" opinions are important... but again you ask the top 2
| hospitals and they differ in opinion, any other hospital is
| typically in one of those camps.
| hashishen wrote:
| Funny i just had the thought the other day about how we as a
| society need to move past the pdf format or even just update it
| to be editable in traditional document software. The fact that
| Google docs will export as a pdf and not have it saved in the
| documents is proof its gotten to a point of inefficiency and
| that's just one example
| easytigerm wrote:
| The OP will be pleased to know that they're not the first person
| to think of this idea. Searching for "computable clinical
| guidelines" will unearth a wealth of academic literature on the
| subject. A reasonable starting point would be this paper [1].
| Indeed people have been trying since the 70s, most notably with
| the famous MYCIN expert system. [2]
|
| As people have alluded to and the history of MYCIN shows, there's
| a lot more subtlety to the problem than appears on the surface,
| with a whole bunch of technical, psychological, sociological and
| economic factors interacting. This is why cancer guidelines are
| stuck in PDFs.
|
| Still, none of that should inhibit exploration. After all, just
| because previous generations couldn't solve a problem doesn't
| mean that it can't be solved.
|
| [1] https://pmc.ncbi.nlm.nih.gov/articles/PMC10582221/
|
| [2] https://www.forbes.com/sites/gilpress/2020/04/27/12-ai-
| miles...
| adolph wrote:
| To the author:
|
| The above is a high quality comment with worthy areas to study.
|
| Additionally I would draw your attention to NCCN's "Developer
| API" which is not interesting technologically but how it
| reflects the IP landscape.
|
| https://www.nccn.org/developer-api
| fasa99 wrote:
| WAIT ... Hole up... what have we here:
| https://www.nccn.org/compendia-templates/compendia/nccn-comp...
|
| TLDR: The NCCN surely has a clean pretty database of these
| algorithms. They output these junky pdfs for free. Want cleaner
| "templates" data? Pay the toll please.
|
| What we have here is a walled garden. Want the treatment
| algorithm? Here muck through this huge disaster of 999 page pdfs.
| Oh you want the underlying data? Well, well, it's going to cost
| you.
|
| What we have here is not so much different than the paywalls of
| an academic journal. Some company running a core service to an
| altruistic industry and skimming a price. OP is just writing an
| algorithm to unskim it. And nobody can really use it without
| making the thing bulletproof lest a physician mistreat a cancer.
|
| To my sentiment this is yet another unethical topic in
| healthcare. These clunky algorithms, if a physician uses them,
| slows the process and introduces a potential source of error,
| ultimately harming patients. Harming patients for increased
| revenue. The physicians writing and maintaining the guidelines
| look the other way given they get a paycheck off it, plus the
| prestige of it all, similar to some scenarios in medicine itself.
|
| The natural thing to do is crack open the database and let
| algorithms utilize it. This whole thing of dumping data in an
| obstruse and machine-challenging format, then a rube goldberg
| machine to reverse the transformation, it's not right.
|
| Anyway I mention this because there seems to be a thought of
| "these pdfs are messy lets clean them" without looking at what's
| really going on here.
| persona wrote:
| OP is talking about the NCCN Guidelines, which doesn't seem to
| be available in other formats or API. From their website:
|
| NCCN Clinical Practice Guidelines in Oncology (NCCN
| Guidelines(r)): The NCCN Guidelines(r) document evidence-based,
| consensus-driven management to ensure that all patients receive
| preventive, diagnostic, treatment, and supportive services that
| are most likely to lead to optimal outcomes.
|
| Format(s) Available for Licensing: PDF API not available
| gcanyon wrote:
| The real question is: why is _everything_ stuck in PDFs, and the
| more important meta-question is: why don 't PDFs support meta-
| data (they do, somewhat). So much of what we do is essentially
| machine-to-machine, but trapped in a format designed entirely for
| human-to-human (also lump in a bit of machine-to-human).
|
| Adobe has had literally a third of a century to recognize this
| need and address it. I don't think they're paying attention :-/
| layer8 wrote:
| PDFs can have arbitrary files embedded, like XML and JSON. It
| also supports a logical structure tree (which doesn't need to
| correspond to the visual structure) which can carry arbitrary
| attributes (data) on its structure elements. And then there's
| XML Forms. You can really have pretty much anything machine-
| processable you want in a PDF. One could argue that it is _too_
| flexible, because any design you can come up with that uses
| those features for a particular application is unlikely to be
| very interoperable.
| queuebert wrote:
| PDFs are essentially compressed Postscript, which is Turing
| complete, so a PDF in theory can do anything you want.
| gmueckl wrote:
| Software that gives treatment instructions may be a medical
| device requiring FDA approval. You may be breaking the law if you
| give it to a medical professional without such approval.
| gibsonf1 wrote:
| The idea of adding hallucination to medical advice seems very
| dangerous.
| xh-dude wrote:
| There's also a regression-to-the-mean problem, the systems
| really shouldn't optimize just for the easier cases. I wonder
| if that's a direct tradeoff, I think maybe it is with the kinds
| of things I see used to tweak out hallucinations.
| guipsp wrote:
| I have to ask: did the author contact any medical professional
| when writing this article? Is this really something that needs to
| be fixed, and will his solution actually fix it?
|
| It seems to me that ignoring the guideline is a physician
| decision, and when it is ignored (for good or for bad), it is not
| because the guidelines are not available in json.
| queuebert wrote:
| As a cancer researcher myself, I'd point out that some branches
| of the decision trees in the NCCN guidelines are based on studies
| in which multiple options were not statistically significantly
| different, but all were better than the placebo. In those cases,
| the clinician is free to use other factors to decide which arm to
| take. A classic example of this is surgery vs radiation for
| prostate cancer. Both are roughly equally effective, but very
| different experiences.
| awinter-py wrote:
| a decision tree is just a csv trapped in amber. share the actual
| data
| anigbrowl wrote:
| Why isn't all human knowledge in one big JSON file? Guidelines
| are decision trees, but they're not written to be applied by
| rote, because the identical patients with identical cancers
| posited in the hypothetical _don 't exist_. The guidelines are
| not written for maverick clinician movie protagonists to navigate
| the decision tree in real time while racing against an
| oncological clock, they're for teams of clinicians who are very
| well trained and used to working with each other, _and_ who have
| the skills to notice where the guidelines might be wrong or need
| expansion or modification. That is, they 're abstractions of the
| state-of-the-art.
|
| Now it'd be nice if these could be treated like source code on
| some sort of medical VCS, to be easily modded or even forked by
| sub-specialists. But wetware dependencies are way more
| complicated than silicon ones, and will remain harder to
| discretize for some time to come.
|
| It's not that the author's aspirations misguided, they're great.
| But I believe progress in this area is most easily realized as
| part of a relevant team, because what looks conceptually simple
| from outside the system only seems that way because of a lack of
| resolution.
| amai wrote:
| Why is anything stuck in PDFs?
|
| PDFs are just good for just one thing: printing. Data stored as
| PDF is meant to be printed and not being processed by any other
| means.
| inportb wrote:
| Profit.
|
| I use these guidelines all the time in the PDF format for free,
| and I'd love to have these in a structured format. For $3000/year
| you could get 50 users access to PDF prescription templates to
| speed up their work. That's not bad, but it's still all PDF.
|
| For the nice low price of "contact us for pricing," though, you
| could have EMR integration. They couldn't justify $$$$ for EMR
| integration if all this information is easily accessible.
|
| https://www.nccn.org/compendia-templates/nccn-templates-main...
| grovehaw wrote:
| In the UK guidelines are published by the National Institute for
| Clinical Excellence. Guidance is available to all in html and pdf
| formats.[0]
|
| [0] https://www.nice.org.uk/guidance
| w10-1 wrote:
| "At their core, guidelines are decision trees"
|
| That's wishful and perhaps not even helpful as a goal. Guidelines
| rarely have the data to cover all possible legs of decisions.
| They report on well-supported findings, offer expert opinions on
| some interpolated cases, and perhaps list factors to consider for
| some of the remainder. If you reduced this to a decision tree,
| you'd find many branches are not covered, and most experts could
| identify factors that should lead to a more complex tree.
|
| The reason is that branches are rarely definitive. It's more like
| quantum probabilities: you have to hold them all at once, and
| only when treatment works or doesn't does the disease (here
| cancer) declare itself as such.
|
| Until the true information architecture of guidelines is
| captured, they will be conveyed as authoritative and educational
| statements of the standard of care.
|
| In almost all cases, it's more important to reduce latency and
| increase transparency (i.e, publish faster but with references)
| than to simplify or operationalize in order to improve uptake.
| Most doctors in dynamic fields don't need the simplification;
| they rely on life-long self-discipline and diligence to overcome
| difficulty in the material, and use guidelines at most as a
| framework for communication and completion, i.e., for knowing
| when they're addressed known concerns.
|
| Structured guidelines mainly enable outsiders to observe and
| control in ways that are likely to be unproductive.
___________________________________________________________________
(page generated 2024-12-24 23:00 UTC)