[HN Gopher] Understanding the PURL Specification (Package URL)
       ___________________________________________________________________
        
       Understanding the PURL Specification (Package URL)
        
       Author : todsacerdoti
       Score  : 63 points
       Date   : 2025-06-05 16:02 UTC (6 hours ago)
        
 (HTM) web link (fossa.com)
 (TXT) w3m dump (fossa.com)
        
       | emddudley wrote:
       | Not related to PURLs (Persistent URLs) administered by the
       | Internet Archive.
       | 
       | https://purl.archive.org/
        
         | CaliforniaKarl wrote:
         | Or PURLs in general, the concept for which was developed in
         | 1995, per
         | https://en.m.wikipedia.org/wiki/Persistent_uniform_resource_...
        
         | layer8 wrote:
         | Nor to the Purl programming language:
         | https://esolangs.org/wiki/Purl
         | 
         | I wonder if Yarn will support PURLs. ;)
        
         | 01HNNWZ0MV43FF wrote:
         | Where can I read more about this?
        
           | pombreda wrote:
           | We maintain the spec at https://github.com/package-url/purl-
           | spec
           | 
           | And the new thing, working towards making it a real standard
           | with Ecma https://tc54.org/purl/ ... :)
        
         | ttepasse wrote:
         | I remember when purl.org namespace URIs where the thing for RSS
         | 1.0 modules. 25 years ago,
        
         | pombreda wrote:
         | Not at all related. Just nicknamed the same.
        
       | 90s_dev wrote:
       | xkcd 927 is shown in the first link. It seems xkcd is now as
       | official a part of the everlasting software community as markdown
       | is.
        
         | pombreda wrote:
         | Actually, I also used it when I first presented PURL at FOSDEM
         | in 2018 https://archive.fosdem.org/2018/schedule/event/purl/
         | .... scroll the video at 9 minutes :] We need moooaaar
         | standards, do we?
        
       | rahkiin wrote:
       | How does the purl work for docker images that are not hosted on
       | docker? Or custom npm registries?
        
         | nonethewiser wrote:
         | Maybe fall into here?
         | 
         | >There's even a generic type as a catch-all for things that
         | don't fit an existing ecosystem (for example, a proprietary or
         | legacy component) or for ecosystems that build custom
         | distributions, such as yocto or buildroot. We should note,
         | however, that SBOM and software composition analysis tools vary
         | widely in their ability to understand generic PURLs, so we do
         | recommend you talk to your current (or prospective) vendor if
         | this is an important feature for you.
        
           | pombreda wrote:
           | You want to avoid the "generic" type... and for docker
           | containers and OCi images that's not needed.
        
         | LawnGnome wrote:
         | The standard supports a repository_url "qualifier" (query
         | parameter)[0], which can be used to override whatever the
         | default registry is (which, for Docker, is hub.docker.com[1]).
         | 
         | [0]: https://github.com/package-url/purl-spec/blob/main/PURL-
         | SPEC...
         | 
         | [1]: https://github.com/package-url/purl-spec/blob/main/PURL-
         | TYPE...
        
         | m4r71n wrote:
         | You can use the `oci` package type for non-Docker images (or
         | any OCI artifacts for that matter).
        
       | heavenlyhash wrote:
       | soo..... what's the guidance for when package names include a
       | slash?
       | 
       | such as approximately everything in golang, which very often
       | matches e.g. "github.com/*" as a package name?
       | 
       | Do would PURL suggest that "github.com/foobar/go-whatnot" should
       | be parsed as namespace="github.com" (odd) and package name
       | "foobar/go-whatnot" (since there aren't any more slashes in the
       | blessed separators)?
        
         | conradludgate wrote:
         | I don't know, but I imagine those are actually the namespace.
         | Eg I would imagine pkg:go/github.com/foo/bar@1.0.0 To be
         | package bar in the github.com/foo namespace.
         | 
         | The distinction doesn't really seem to matter much between
         | namespace and name in all honestly.
        
           | pombreda wrote:
           | Agreed. In hindsight, I always wonder if this was a good idea
           | to have this split. At least the namespace is optional and
           | required only certain package types
        
         | layer8 wrote:
         | The canonical answer would be percent-encoding, so _pkg:golang
         | /github.com%2Ffoobar/go-whatnot_.
         | 
         | https://en.wikipedia.org/wiki/Percent-encoding#:~:text=accor...
        
         | Joker_vD wrote:
         | What's the guidance when URI paths include a slash?
         | pkg:github.com%2Ffoobar/go-whatnot
        
           | pombreda wrote:
           | This is not a valid PURL as it is missing a type, assuming
           | you wanted golang here.
           | 
           | It could be instead:
           | pkg:golang/github.com%2Ffoobar%2Fgo-whatnot
        
         | pombreda wrote:
         | Encode the slash as explained in the clarified spec
         | https://github.com/package-url/purl-spec/pull/453 :)
         | 
         | We are working on further clarifying Golang which a bit
         | problematic: there is really no name or namespace in Go, just a
         | path, and it is not possible at scale to tell when a Go module
         | stops and when a Go package starts just by looking at the
         | path... this is going to be clarified after the merge of the PR
         | 453.
        
           | pombreda wrote:
           | Disclosure: I created that spec and we are working hard to
           | clarify it and remove grey areas!
        
       | dedicate wrote:
       | Okay, so PURL is basically the thing that actually makes SBOMs
       | usable for open source, not just a list of 'best guesses' with
       | CPEs?
        
         | pombreda wrote:
         | That's actually the best explanation I have seen in a long
         | time!
         | 
         | - in most cases, no guesses needed - you can use it in Cyclone,
         | SPDX, and CSAF and still talk about the same package even if
         | the format varies - CVE.org is considering it as an addition on
         | the same footing as CPE - there a good bunch of databases that
         | "speak" PURL, like Google OSV, Sonatype OSS Index, Deps.dev,
         | and AboutCode's PurlDB and VulnerableCode (disclosure: I am a
         | lead maintainer for AboutCode FOSS projects) - most scanners
         | speak PURL too.
         | 
         | Note that same scanners and tools speak not exactly PURL but
         | some "PURLish" dialect and we have a project to help streamline
         | that and lift up the whole ecosystem of PURL users with
         | https://nlnet.nl/project/purlvalidator/
        
         | donenext wrote:
         | Yes, 1000x yes
        
       | alcroito wrote:
       | I wish PURL proposed something sensible or at least usable for
       | tracking C / C++ native libraries, that are NOT hosted on a
       | registry like conan.io, or one of the linux distro registries,
       | but is still (self-)hosted somewhere online.
       | 
       | For libraries that are hosted on `github`, there's at least the
       | github type.
       | 
       | But there is no official `gitlab` or `git` type, and i've read
       | comments that even the `github` type is considered a mistake.
       | 
       | One example of such a library could be a Qt or KDE / Plasma
       | library.
       | 
       | They are hosted on their own forges, https://code.qt.io/ and
       | https://invent.kde.org respectively.
       | 
       | So to the more knowledgeable people out there, what is the PURL
       | way of identifying a C++ library like that?
       | 
       | Is `generic` type + vcs_url qualifier really the only way?
       | 
       | Right now it seems impossible to track vulnerabilities for such
       | libraries with OSS / open tools, because none of the open tools
       | or databases support a custom type or registry or ecosystem.
       | 
       | For example none of services here support some custom C++
       | ecosystem (putting aside conan):
       | 
       | https://docs.dependencytrack.org/analysis-types/known-vulner...
       | 
       | Same for https://docs.dependencytrack.org/datasources/osv/
        
         | donenext wrote:
         | completely agree here `git` type using the namespace of your
         | choice would be plenty to enable tools to find these packages.
         | Even though its not "officially" supported in the spec this is
         | what we do internally
        
           | pombreda wrote:
           | IMHO, a bare git stuff would be a git URL as specified in pip
           | and SPDX and not a PURL... I would be interested to know more
           | about your use case. Feel free to drop a note at
           | pombredanne@aboutcode.org
        
         | pombreda wrote:
         | Note that there should be a gitlab type as it is planned for:
         | https://github.com/package-url/purl-spec/blob/a90ee02679afc3...
         | 
         | gitlab and github do provide package-like discoverability. Do
         | you have a pointer that says a github package is a mistake?
        
         | pombreda wrote:
         | You wrote:
         | 
         | > So to the more knowledgeable people out there, what is the
         | PURL way of identifying a C++ library like that?
         | 
         | That's a blind spot. This is a real problem for every as you
         | rightfully explained.
         | 
         | So I have been thinking a lot about how to track C/C++ native
         | libraries, and I have been working on a plan to deal with this.
         | 
         | You can read a summary there (that I just posted to supply this
         | discussion!) - https://github.com/aboutcode-
         | org/www.aboutcode.org/issues/30
         | 
         | And this comment links to more detailed work-in-progress
         | planning doc: - https://github.com/aboutcode-
         | org/www.aboutcode.org/issues/30...
         | 
         | If you want to chip in and help, this would be awesome.
         | 
         | And IMHO, aligned with your thinking this should not be tied to
         | a build system or a for-profit operation like conan.io, or a
         | linux distro, or for that matter a specific build tool or
         | approach as they are so many, and be self-hosted, easy to sync,
         | and simple to store in a git repo.
        
           | alcroito wrote:
           | Thanks for the links! I hope the proposal works out. I
           | skimmed through the doc, and one thing i'd suggest is to
           | consider using the CPS format rather than the ABOUT one for
           | the metadata. The format is driven by Kitware, the developers
           | of cmake, and thus if it's contributed to them, a big chunk
           | of the cpp ecosystem would get buy-in just because of the
           | intertia of using cmake, and getting it for free with the
           | tool.
           | 
           | https://cps-org.github.io/cps/overview.html
           | 
           | I'm not sure how I can help, but I'm open for discussion,
           | because the company i work for is also interested in how to
           | handle this well for our products.
        
       | RS-232 wrote:
       | I love PURLs, but the namespace attribute smells. It's way too
       | arbitrary.
       | 
       | What's the point of com.something.other? Why are we using dot
       | notation when everything else is kebab case?
        
         | pombreda wrote:
         | Not sure I parse... do you mind to elaborate?
        
           | pombreda wrote:
           | Is this about Maven "groupid" mapped to a namespace?
           | "com.foo.bar" is Maven's own invention and notation.... in
           | most cases we are just trying to adopt the ecosystem
           | convention to minimize fictions.
        
       | quibono wrote:
       | For all its expressiveness of the CPE format I find PURLs much
       | easier to work with. Especially when it comes to software that
       | doesn't fall neatly into the classic vendor/product split like
       | what CPE envisions.
        
         | pombreda wrote:
         | Yeah, the CPE idea of a vendor for an open source package does
         | not compute too well!
         | 
         | FWIW, PURL came about as I could NOT put my mind around CPEs
         | when I was scanning for package and deps with scancode and
         | could not find any easy way to go from that to looking up a
         | vulnerability/CVE in the NVD, as it was all guesswork and
         | manual.
         | 
         | So we started instead to put the vuln data in our own db, keyed
         | by something that would be easy to relate from the scans. This
         | eventually became PURL
         | 
         | This is all tracked in these places: - The original issue:
         | https://github.com/aboutcode-org/scancode-toolkit/issues/805 -
         | The initial pull request with many comments:
         | https://github.com/package-url/purl-spec/pull/1
        
       | kdeldycke wrote:
       | I have a project called Meta Package Manager that supports pURLs,
       | so you can:
       | 
       | $ mpm install pkg:npm/left-pad@1.2.3
       | 
       | Other commands allows you to export the SBOM of all packages
       | installed on your machine. More info at:
       | https://github.com/kdeldycke/meta-package-manager
        
         | pombreda wrote:
         | This is awesomely nice!
        
       | donenext wrote:
       | Hot take, `generic` as a type is a crutch most tooling uses out
       | of laziness and has significantly reduced the usefulness of PURL
       | spec. How do we improve this?
        
         | donenext wrote:
         | Can we completely eliminate generic as a type to remove this
         | crutch?
        
           | pombreda wrote:
           | All abstractions leak eventually, so we need that escape
           | hatch IMHO. Otherwise you end up with the other issue which
           | is that there are stuff you cannot track with PURL?
        
         | jessoteric wrote:
         | isn't the issue that sometimes a given scanner can't know from
         | where the package is sourced?
         | 
         | like if I'm scanning an arbitrary linux system, and I see
         | `libssl.so.1` but I don't see it in the local package manager,
         | I don't really have an option other than to call it generic.
         | 
         | I do agree that "generic" seems to be WAY overused though.
         | Maybe tools that report on SBOMs, like FOSSA or whatever,
         | should emit warnings to users about "generic" PURLs.
        
           | donenext wrote:
           | Thats fair. It just seems silly that a spec intended to
           | "uniquely ID a package" supports a type that is the complete
           | opposite of "unique". I guess another way to frame my take is
           | should `generic` be consider a valid PURL? Keep it as a fall
           | back sure, but distinguish between "fully qualified" PURLs
           | and "partial" PURLs.
           | 
           | This then gives tooling a path to prompt users to provide
           | missing context needed to fully qualify the PURL
        
             | pombreda wrote:
             | > distinguish between "fully qualified" PURLs and "partial"
             | PURLs.
             | 
             | Can you tell a bit more? Not sure I get what you meant
        
             | jessoteric wrote:
             | That seems like a good idea... hmm.
        
           | pombreda wrote:
           | > isn't the issue that sometimes a given scanner can't know
           | from where the package is sourced?
           | 
           | That's the problem: there is no metadata with or in
           | libssl.so.1 that I can reliably use to tell what this is
           | 
           | Eventually I can see a solution made of
           | 
           | 1. create the metadata, say a simple YAMl or deb822 key-valud
           | pair file that can then be included upstream or as an overlay
           | 2. define a simple spec for binary formats to include a PURL
           | (say in an ELF section or a WinPE string or sorts, where many
           | of these are already stored) 3. create content-based tools
           | like we have in PurlDB to match code, but may be more like a
           | bunch of generated yara rules that would match symbols and
           | strings from source to binaries and can recognize that
           | libssl.so.1 is from OpenSSL 1.1.1g.
        
         | pombreda wrote:
         | Yeah, I added generic as an escape hatch, but this should be
         | only used by exception, e.g., a crutch. An abused crutch.
         | 
         | Eventually, let's fix this first for C/C++:
         | 
         | https://github.com/aboutcode-org/www.aboutcode.org/issues/30
         | 
         | And based on that approach we can either: 1. create new,
         | sensible types as needed 2. and/or maintain a last resort open
         | registry of generic types at least so we get some sanity in the
         | process.
        
       | zzo38computer wrote:
       | In my opinion, there are some problems with this, such as:
       | 
       | - The cryptographic hash is not included. (They do mention
       | security, a hash and/or public keys would be helpful for
       | security. It would also be helpful for identification if names
       | are reused for unrelated reasons.)
       | 
       | - There is not a distinction between interfaces and
       | implementations (which in some cases you might care about,
       | although not always).
       | 
       | - They do not mention examples of what qualifiers are possible
       | for some package types.
        
         | pombreda wrote:
         | Can you tell a bit more? What is this? The OP article?
        
       ___________________________________________________________________
       (page generated 2025-06-05 23:00 UTC)