[HN Gopher] Mapping almost every law, regulation and case in Aus...
___________________________________________________________________
Mapping almost every law, regulation and case in Australia
Hey HN, After months of hard work, I am excited to share the first
ever semantic map of Australian law. My map represents the first
attempt to map Australian laws, cases and regulations across the
Commonwealth, States and Territories semantically, that is, by
their underlying meaning. Each point on the map is a unique
document in the Open Australian Legal Corpus, the largest open
database of Australian law (which, full disclosure, I created). The
closer any two points are on the map, the more similar they are in
underlying meaning. As I cover in my article, there's a lot you
can learn by mapping Australian law. Some of the most interesting
insights to come out of this initiative are that: Migration,
family and substantive criminal law are the most isolated branches
of case law on the map; Migration, family and substantive
criminal law are the most distant branches of case law from
legislation on the map; Development law is the closest branch of
case law to legislation on the map; Case law is more of a
continuum than a rigidly defined structure and the borders between
branches of case law can often be quite porous; and The map does
not reveal any noticeable distinctions between Australian state and
federal law, whether it be in style, principles of interpretation
or general jurisprudence. If you're interested in learning more
about what the map has to teach us about Australian law or if you'd
like to find out how you can create semantic maps of your own,
check out the full article on my blog, which provides a detailed
analysis of my map and also covers the finer details of how I built
it, with code examples offered along the way.
Author : ubutler
Score : 334 points
Date : 2024-03-22 07:53 UTC (15 hours ago)
(HTM) web link (umarbutler.com)
(TXT) w3m dump (umarbutler.com)
| mg wrote:
| This is great. This sentence struck a chord with me in
| particular: Imagine applying these techniques
| on the Common Crawl You would be able to produce a ...
| map of the internet.
|
| Making maps of things not usually on maps has been my passion for
| years. And I made many of them. One of the more popular ones that
| some of you might know is the Music-Map:
|
| https://www.music-map.com
|
| I have had the urge to make a map of the web for quite a while.
| Already registered the web-map.com domain for it. I did some
| experiments, built a custom crawler and an algorithm which finds
| related websites fast. It showed that the project would be
| feasible.
|
| But I hold back on doing it, because I already run multiple
| experimental maps and have yet to come up with a business model
| for "making maps of everything".
| vsnf wrote:
| I had something similar once - it was a graph of connections
| between all the artists in my Spotify library to see who had
| collab'd with who. It was a lot of fun to see just how
| distantly connected two artists were through a long chain of
| collabs and collabs. Of course, like most human connection
| maps, it mostly came down to a handful of super-connectors who
| collaborate with hundreds of people, who in turn collaborate
| with their own niche groups. But there were some interesting
| groups revealed by it.
| dylan604 wrote:
| I was halfway expecting a 6-degree to Kevin Bacon reference
| here. Disregarding the actual Bacon, I was almost hoping a
| similar effect from any 2 artists can be connected in 1 Bacon
| or less
| ubutler wrote:
| Thanks for sharing that map, I'm going to start using it to
| discover new artists :)
|
| I'd love to see a semantic map of the internet, I'm considering
| having a crack it as well, but it'd be a monumental task. There
| is this cool map but it's quite dated: http://internet-map.net/
| rmnclmnt wrote:
| So cool, thanks for sharing! I see you've also done it for
| movies, which is pretty cool and useful.
|
| I could not find any technical details on the input data /
| feature extraction / clustering method used in these tool. Do
| you mind sharing what you have used so far?
| mg wrote:
| The Music-Map and the Movie-Map are based on user
| preferences. The Music-Map is based on
| https://www.gnoosic.com and the Movie-Map on
| https://www.gnovies.com, two AI projects I started before the
| maps.
|
| The AI and the mapping algorithm are my own developments. I
| was mostly inspired by thinkers like Douglas Hofstadter and
| John R. Koza.
| tomthe wrote:
| It is really cool and useful. Interesting that you were
| able to gather enough data from users to make it work. I
| guess it was much less useful in the beginning?
|
| I thought of making something similar with data from
| https://musicbrainz.org/
| mg wrote:
| Yes, in the beginning pretty much everybody hated it and
| thought the project was nuts. I got pretty much no
| positive feedback but lots of negative. I was like "But
| it's learning! It's learning!" :) Strangely, that
| convinced almost nobody, even among my friends.
|
| Now that many millions of people have used it, I get a
| lot of great, often enthusiastic feedback on how Gnod
| makes the best recommendations.
|
| That teached me that you can't convince people with just
| an idea. For most people, you have to deliver something
| which is already useful to them.
| Citizen_Lame wrote:
| Your effort is appreciated, but recommendations miss the
| mark by a considerable margin, to say at least.
| alwyn wrote:
| You are the creator? Thank you for what you do! I've used it
| with pleasure for many years.
| mg wrote:
| Yes. Happy you like it!
| jcul wrote:
| Very cool. I've immediately found some music I really like that
| I've never heard before.
| snats wrote:
| I built a map of all the PDF urls on the internet recently.
|
| I used a tiny embeddings model and PCA for dimensionality
| reduction.
|
| https://weblog.snats.xyz/posts/2024/03/20/
| ubutler wrote:
| Interesting, did you try also using PaCMAP or UMAP for
| dimensionality reduction? It might result in a more
| meaningful representation of their underlying semantic
| structure: see the 'mammoth' example in my article.
| snats wrote:
| No! I only tried PCA, but I still have the embeddings.
|
| I'll try later and post results.
| Groxx wrote:
| You might also like: https://everynoise.com/
| quenix wrote:
| This one is mesmerizing. Highly recommend checking it out.
| mbo wrote:
| I did something similar for fragrances a little while ago:
| https://observablehq.com/@55th/every-fragrance-at-once
| itshossein wrote:
| Great job! There is a form to report typos. Anywhere for
| duplicates and more complicated errors?
| mg wrote:
| What is the difference between a typo and a duplicate? If you
| mean that two ways of writing the same name are both legit,
| then you have to decide on one being the more "correct" one.
| After a while Gnod will figure out which one is the more
| common name.
|
| And "complicated errors"?
| infostud wrote:
| Thank you for this effort. Did you access data from
| http://austlii.edu.au ?
| ubutler wrote:
| Nope, the map is built atop the Open Australian Legal Corpus,
| which is the first open database of Australian law (you can
| read about how I built it here: https://umarbutler.com/how-i-
| built-the-largest-open-database...). Unfortunately, AustLII is
| free but not open-source (as in licensed under an open-source
| licence: https://austlii.edu.au/copyright.html).
| amb1337 wrote:
| This approach could be used to build a global map of AI and/or
| data privacy legislation and cases that would be potentially very
| valuable and useful, particularly for startups.
| boffinAudio wrote:
| This is really awesome, thanks for the work and thanks for
| sharing.
|
| This is a really interesting form of mapping - would you consider
| doing it for the original occupant's languages, as well?
|
| Australian law itself is fascinating - those outliers on the
| edges of some of the trails are very curious - is this indicating
| that some of this material is authored, possibly by the same
| people/groups whose ontology is transferred with each new
| document?
|
| I'd love to see this semantic map for the original occupants
| languages.
|
| It would also be interesting to see Australia's human rights
| proclamations and related legislature, as well as its military
| orders and authorizations for involvement in the 5-eyes
| catastrophe somehow, semantically, in this context.
| defrost wrote:
| > would you consider doing it for the original occupant's
| languages, as well?
|
| Bit of a challenge as of the _many_ languges, few are still
| actively spoken and, as oral unwritten languages, there 's an
| issue with inconsistent European spelling creating text that
| truly native speakers still have to learn to read.
|
| For your interest; Aboriginal Language Groups:
| https://mgnsw.org.au/wp-content/uploads/2019/01/map_col_high...
| defrost wrote:
| Really nice writeup, I appreciate the work you've put into that
| in both the descriptive analysis of the data and the technical
| breakdown of the process.
| ubutler wrote:
| Thank you :)
| isoprophlex wrote:
| Thank you so much for replacing the interactive visuals with
| screenshots on mobile! Makes for a much better experience reading
| this on my phone.
| ubutler wrote:
| I'm glad you appreciated that touch :) Seeing as 59% of my
| readers are on mobile, I thought it'd be better to have a
| static image rather than an interactive map which would be
| pretty unusable on a phone.
| amand33p wrote:
| I second it. But there's a bug. If we reduce the browser
| window width, and re-increase it, the charts stay in non-
| interactive state.
| ivanoconnor wrote:
| Last year, I had a similar idea to "map out" case law and
| legislation in the UK -- as usual, though, life got in the way
| and it's ended up joining my vast collection of half-finished
| projects. Having read your excellent writeup, I'm now feeling
| rather inspired to give it another try! :)
| dleeftink wrote:
| > "we can also see that Australian case law is a continuum of
| sorts"
|
| It definitely provides a pretty picture, but just wanted to
| emphasise the map !== territory addage. The continuum may rather
| be a function of the projection, chosen similarity metric and so
| on.
|
| That does not mean we cannot learn from the map, but that the
| actual 'knowledge structure' of the sum of documents may not be a
| convenient continuum at all.
|
| In any case, the way you've documented this project is
| remarkable, and it does provide a novel view of the Australian
| legal sphere. Thanks for sharing!
| ubutler wrote:
| > It definitely provides a pretty picture, but just wanted to
| emphasise the map !== territory addage.
|
| You're right -- my map does not _necessarily_ represent the
| underlying semantic structure of Australian law, it is an
| approximation, one that is biased by the data I used (which as
| I mentioned, is missing laws and cases from a number of
| jurisdictions), the embedding model I selected and the
| dimensionality reduction model I used to project my embeddings
| into a two-dimensional space, to name a few.
|
| Because I was writing for both legal and data science
| audiences, I tried to avoid sounding like my inferences are
| anything more than just inferences but without getting too
| technical and explaining the inherent limitations of any
| attempt to semantically map knowledge with today's technology.
|
| I will just say though that, having studied law myself,
| Australian case law is indeed somewhat of a continuum. A single
| case may touch on many areas of law and there are no
| restrictions in terms of subject matter on what precedents a
| judge may draw upon in reaching a decision, apart from that
| they are both relevant and binding (or, if they are not
| binding, are not treated as such).
|
| It was also interesting to observe how the final clusters that
| developed were uncannily similar to the way in which I was
| taught law at university. It goes to show, there's a lot of
| thought put into the design of our legal courses here in
| Australia. In fact, there are 11 subjects that are mandatory,
| known as the Priestley 11:
| https://en.wikipedia.org/wiki/Priestley_11. All of those are
| reflected on the map, although some have been rolled up into
| larger categories or divided by other means.
| mistermann wrote:
| I think it can sometimes be useful to take this map !=
| territory concept further - all instances of map != territory
| are not equal, some have the potential for higher utility
| than others. And, I would estimate that
| concepts/methodologies like this (anything that provides
| humans new ways to examine and conceptualize important
| matters) almost certainly have higher potential than
| standard, run of the mill instances of map != territory (the
| likelihood of us _being able to find and harvest_ that
| utility is another layer of complexity, but then so is the
| notion that utility is often found not only in the
| destination, but also in the journey). (Unfortunately, modal
| logic notation seems to currently have no support for
| describing these sorts of concepts, at least according to
| ChatGPT).
|
| The "so what?" of it is that if people (particularly smart
| ones) exclude these additional concepts from their logical
| consideration, it is possible that the idea could be
| dismissed, or have its potential importance estimated to be
| lower than it actually/potentially is, potentially leading to
| an outcome whereby this map _or the underlying methodology_
| (applied to other domains) is not maximally exploited to
| achieve positive outcomes.
| bbor wrote:
| Amazing work. As someone doing self-funded web dev, how do you
| find the time to work on this? Is this a resume booster, a
| product/prototype, or just a passion of love? To say the least
| this is groundbreaking.
|
| I love your technical explanations, even tho I started skimming
| there. It appears this is all built on modern embedding
| algorithms, plus traditional ML clustering magic. Now that you
| have the basic data, have you thought about using full generative
| models for semantic analysis? Ie "write summaries of this subset
| of cases and tag them with specific situations or intricacies",
| and then do clustering on that? I feel like that's the natural
| next computational step, and surely (hopefully?) what the many
| millions/billions of dollars worth of SWEs that were put to work
| applying LLMs to case law over the past year in America are up
| to.
|
| The very best projects on here are ones where I'm tempted to ask
| to collaborate, even though I know I'm already booked up with
| work through the horizon! I'll have to console myself with a
| comment and a very prestigious place in my "inspirations"
| bookmark folder :)
| defrost wrote:
| The blog about might interest you:
|
| https://umarbutler.com/about/ I'm Umar
| Butler, an Australian data scientist, legal technologist and AI
| researcher. This is my blog where I write about law,
| technology, AI and everything in between. As part
| of my research into legal technology and AI, I have published,
| inter alia, the first dataset for training LLMs on Australian
| law, the largest open database of Australian law and the first
| open LLM for Australian law. I currently serve as
| the Assistant Director of Data Science at the Attorney-
| General's Department. My work centres around the responsible
| use of AI to enable, accelerate and enhance public decision
| making and legal and policy analysis, in addition to consulting
| on the development of key AI policy.
| bbor wrote:
| WOW ok, thanks so much for doing my homework for me! I guess
| I just have to look into a high level government position
| that encourages me to follow my own interests, easy peasy...
| contingencies wrote:
| The problem with Australian law, and I suspect most law, is that
| the practical problems of the actual system appear to be less
| about the theory and more about the absence of enforcement,
| oversight and due process.
| guidedlight wrote:
| *Except Victoria by the looks of it. :-(
| Simon_ORourke wrote:
| Does this make it any way easier to replace lawyers with an LLM
| or expert system?
| ivyirwin wrote:
| Not OP but working on a project in similar domain (ndaok.com).
| The technology is definitely making it easier to replace
| lawyers. The biggest barrier right now is lawyers themselves.
| In fact our project stopped trying to sell to lawyers because
| it's almost like they purposefully refuse to adapt new
| technology. Instead we've had success with customers trying to
| find a way not to use lawyers when they are not needed.
| Simon_ORourke wrote:
| > trying to find a way not to use lawyers when they are not
| needed.
|
| Kudos to you guys, the elimination of the need for lawyers is
| up there with any societal issue you care to name. It may do
| more for social justice than funding anything else
| lmeyerov wrote:
| This is the heart of most real generative AI systems for
| reasoning about text: index data using this basic technique
| (chunked document embeddings), and when talking to the AI, the
| AI looks up documents from these clusters and loads them in as
| context for making the answer. Many ways to improve over this,
| but it's the heart.
|
| In our case (louie.ai), users will have vector indexed their
| documents into a scalable database like
| OpenSearch/elasticsearch, or we help them do it, and they can
| talk to the data, visualize it, run analytics, etc. For
| example, "get everything on koala adoption from the last decade
| and draw as a clustered map" would generate a hybrid query to
| find "semantically similar" documents based on vectors and also
| symbolically on the time stamps, run it, and then decide to do
| the followup step of visualizing it using the same family of
| viz technique in the article. We haven't tried law yet, but
| already do this for areas like disaster, crime, & misinfo
| intelligence from social media & news. (Imagine: "Alert me when
| ..." or "summarize what...").
|
| We find this approach fast and easy, but for very important
| questions, lower quality than we would like. Imagine a scenario
| like case law around koalas changing precedent over time. RAG
| using Langchain/LLMindex + OpenAI over a vector index doesn't
| solve that kind of thing out of the box. But they are
| solveable, and it's pretty fun to work through these kinda of
| issues :)
| mmsc wrote:
| Cool stuff, reminds me of "a Canadian payroll dependency chart"
| https://news.ycombinator.com/item?id=38843388
| jordanpg wrote:
| This is very cool, congratulations.
|
| When I was in law school, I sometimes visualized the "common law"
| as a web of interdependencies. This is a similar visualization,
| although it doesn't quite capture the dependencies, at least as I
| have always imagined it.
|
| For context, the common law refers to law made by (mostly)
| appellate judges. Sometimes it's built on top of statutory law
| (e.g., providing meaning, interpretation, or definition to
| statutory laws) and sometimes it's completely made up, when there
| is no law "on point." It's made up in the sense that it's
| constructed on top of a long trail of historical precedent,
| sometimes going all the way back to Victorian-era England or even
| older. Really.
|
| (Aside: This is why certain individuals sound so silly when they
| rail against "judge-made law" in the US. Virtually all law in the
| US is "judge-made law.")
|
| Anyway, the common law has always seemed to me to be amenable to
| representation as a graph-like structure where nodes are cases or
| precedents and the edges somehow encode the strength of the
| support for the precedent. I think judges might think twice about
| breaking from precedent (which can be virtuous or not, depending
| on your viewpoint) if they could see a visualization of how
| strong the precedent is.
|
| This representation is a step in that direction and I hope your
| tech can be extended to other common law countries!
| TheCaptain4815 wrote:
| This is such an interesting use of semantic representation. I
| wonder if it could be used to map out cases vs outcomes, and
| determine sentencing outliers.
| chottocharaii wrote:
| "My map represents the first attempt to map Australian laws,
| cases and regulations across the Commonwealth, States and
| Territories semantically, that is, by their underlying meaning."
|
| I think Jade.io has had a go at this, IIRC. This isn't to detract
| upon your amazing work though, great stuff.
| ubutler wrote:
| Thank you :). Would you mind sharing what you have in mind? I
| haven't come across a visualised semantic map of state and
| federal Australian laws, cases and regulations before.
| epgui wrote:
| I think visualizing it like this is very strange. I am not a
| legal expert but I have read a lot of law textbooks.
|
| Normally, I'd expect blackletter law to form a somewhat sparse,
| tentacle-like structure.
|
| Case law (or "cases" or "jurisprudence") is by its nature largely
| interstitial: it consists of judges "filling in the holes" that
| are left by any unclear meaning (requiring interpretation) of
| blackletter law, or in some cases by the absence of such.
|
| Having case law and blackletter law form two distinct clusters
| makes no sense to me: I really think it's a domain modelling
| error. It's what I would expect to see if one applied a text
| similarity measure naively to some data set, without regard for
| the domain models.
| ubutler wrote:
| As I note in my article, the language and style employed in
| Australian judgments is different from that employed in
| statute. Furthermore, in common law countries like Australia,
| you have many legal concepts that have developed independently
| of statute and either remain independent or have been
| formalised into statute (see, eg, torts:
| https://www.alrc.gov.au/publication/traditional-rights-
| and-f...).
| epgui wrote:
| I understand that, but there is a difference between text
| similarity and semantic similarity. You claim to have
| performed semantic clustering, but what I am seeing, and what
| you are saying in your response to my comment, has less to do
| with semantics and more to do with superficial textual
| encodings.
|
| Case law and blackletter law will obviously look very
| different in terms of their textual representation, style,
| formatting, etc... And this will be true even when they
| pertain to the same ideas and the same concepts.
|
| To state the obvious, semantics is about the meaning of
| things, not about style and not about specific word choices
| or specific syntactical forms (although sometimes these carry
| meaning as well).
| ubutler wrote:
| > Furthermore, in common law countries like Australia, you
| have many legal concepts that have developed independently
| of statute and either remain independent or have been
| formalised into statute.
|
| This is the bigger point. In my own university studies,
| there was a clear segmentation between the common law and
| statute, although they are certainly interrelated.
|
| It's also worth noting that the boundaries between cases
| and legislation were not absolute, there were areas of the
| cases 'mainland' that contained legislation.
|
| My point on the style was that in addition to differences
| in purposes, they are also textually different, which can
| indeed bleed into semantics.
| epgui wrote:
| The point is not lost on me. Certainly tort law, contract
| law, administrative law, and many other areas of law
| aren't usually sourced from blackletter law as much as
| from jurisprudence or other sources of law.
|
| I think this very point you're trying to make would be
| more persuasive if the analysis had modelled the
| relationships that do exist between blackletter law and
| case law. As we have already discussed, text similarity
| may not suffice to reveal these relationships. And while
| these relationships don't always exist, when they do
| exist they are very strong.
| green-eclipse wrote:
| Would be great to see some of Fisk's cases in here /s
| throwup238 wrote:
| You need to get in contact with Rob Sitch. They can probably make
| a whole season of Utopia based around this!
| sevenseventen wrote:
| Mapping the internet as a whole has been a thing for quite a
| while, going back to Kumar et al in 2000.
| https://scholar.google.com/citations?view_op=view_citation&h...
|
| I recall at least one of those papers characterizing the shape as
| resembling a bow-tie.
|
| This and other early contributions were looking at the link
| structure of the internet, not textual similarity, though.
| IIAOPSW wrote:
| I've been dealing with some matters in the Australian legal
| system, for a long while self represented and self taught but
| recently with a solicitor. I've read a number of acts for myself,
| procedural civil and criminal, and have even run into the
| invisible wall between legislation and case law.
|
| This has been shockingly pertinent to my interests and I thank
| you for compiling it. My only gripe is that you didn't post it
| several months prior when it would have been most helpful to me
| ;)
| ubutler wrote:
| > I've read a number of acts for myself, procedural civil and
| criminal, and have even run into the invisible wall between
| legislation and case law.
|
| Glad to hear it corresponded with your lived experience, it
| really was surprising to see how the map correlated with my own
| understandings of the law developed through my degree!
| MisterDizzy wrote:
| Seems like quite a project. And very useful.
|
| Australia is the perfect example of when too many well-meaning
| people who think they can solve everything with more government
| power are given too much capability to see their vision through
| to its logical conclusion. It ends up making most of the problems
| it tries to solve far worse, and nobody has the guts to pull the
| plug on the programs that aren't functioning.
| techbrovanguard wrote:
| Clearly, the solution is more neoliberalism.
| sema4hacker wrote:
| Most of your work seems over my head, but doesn't the "mammoth"
| example indicate that by tweaking numbers you can end up getting
| just about any visual blob you want?
| jasonjei wrote:
| I've noticed in many commonwealth countries there is no official
| codification of case law, administrative law, and statutory law
| passed by the legislative body and receiving assent from the
| executive branch.
|
| The US being a hard fork of the commonwealth has the official US
| code and state codes--attempts to organize impacts of case law,
| admin law, passed law, etc--but Canada has pockets of
| codification (the Criminal Code), but not all acts of Parliament
| are organized in a single code. The UK as far as I can tell has
| no such thing in England or Wales. Hong Kong has some semblance
| of codification with the Basic Law and ordinances. Does Australia
| have codification at a federal or state level?
| dragonwriter wrote:
| > The US has the US code and state codes--it attempts to
| organize impacts of case law, admin law, passed law, etc
|
| Um, yes and no.
|
| "US Code" is statute law. The "Code of Federal Regulations" is
| admin law. There is no codification of case law; there are
| reporters, but they are just a flow of case results, similar to
| the sequential publication of statutes in places that don't
| codify statute law (and those that do, too, but for most
| purposes where they do the codification is more generally
| useful for most purposes.)
|
| The states are generally similar: there is codification of
| statute and admin law, but not of case law.
| jasonjei wrote:
| Got it. I was just wondering if other commonwealth countries
| had an equivalent of a US code or Code of Federal Regulations
| that documented law in a centralized store. IANAL, but law
| seems to have so many distinct sources--and more curiously,
| does codification help a lawyer with the job?
| adammarples wrote:
| Would it be correct in saying that a semantic map, clustered by
| meaning, might be pushing it? If the data are word embeddings,
| then you'd hope that they have distilled the semantics in the raw
| text but as you said yourself, they are also heavily influenced
| by style and who knows what else, to the point that semantically
| identical but syntactically different texts might have different
| clusters? Think, if half of the texts were in French, would you
| keep the same semantic map or would you have a French continent
| and an English continent?
| feliixh wrote:
| Great job, I intend to reproduce this on a similar dataset I've
| been collecting!
|
| I will say, it would be great to see the color labeling done on
| domain url alone, to see how much of the topography of the map is
| driven simply by the different formatting characteristics of the
| websites you're gathering data from.
| 6510 wrote:
| This is great, with so many laws it is hard to get any kind of
| overview.
| Hammershaft wrote:
| The central shape created by this dataviz could make for an
| interesting island shape.
___________________________________________________________________
(page generated 2024-03-22 23:00 UTC)