[HN Gopher] Understanding the PURL Specification (Package URL)
___________________________________________________________________
Understanding the PURL Specification (Package URL)
Author : todsacerdoti
Score : 63 points
Date : 2025-06-05 16:02 UTC (6 hours ago)
(HTM) web link (fossa.com)
(TXT) w3m dump (fossa.com)
| emddudley wrote:
| Not related to PURLs (Persistent URLs) administered by the
| Internet Archive.
|
| https://purl.archive.org/
| CaliforniaKarl wrote:
| Or PURLs in general, the concept for which was developed in
| 1995, per
| https://en.m.wikipedia.org/wiki/Persistent_uniform_resource_...
| layer8 wrote:
| Nor to the Purl programming language:
| https://esolangs.org/wiki/Purl
|
| I wonder if Yarn will support PURLs. ;)
| 01HNNWZ0MV43FF wrote:
| Where can I read more about this?
| pombreda wrote:
| We maintain the spec at https://github.com/package-url/purl-
| spec
|
| And the new thing, working towards making it a real standard
| with Ecma https://tc54.org/purl/ ... :)
| ttepasse wrote:
| I remember when purl.org namespace URIs where the thing for RSS
| 1.0 modules. 25 years ago,
| pombreda wrote:
| Not at all related. Just nicknamed the same.
| 90s_dev wrote:
| xkcd 927 is shown in the first link. It seems xkcd is now as
| official a part of the everlasting software community as markdown
| is.
| pombreda wrote:
| Actually, I also used it when I first presented PURL at FOSDEM
| in 2018 https://archive.fosdem.org/2018/schedule/event/purl/
| .... scroll the video at 9 minutes :] We need moooaaar
| standards, do we?
| rahkiin wrote:
| How does the purl work for docker images that are not hosted on
| docker? Or custom npm registries?
| nonethewiser wrote:
| Maybe fall into here?
|
| >There's even a generic type as a catch-all for things that
| don't fit an existing ecosystem (for example, a proprietary or
| legacy component) or for ecosystems that build custom
| distributions, such as yocto or buildroot. We should note,
| however, that SBOM and software composition analysis tools vary
| widely in their ability to understand generic PURLs, so we do
| recommend you talk to your current (or prospective) vendor if
| this is an important feature for you.
| pombreda wrote:
| You want to avoid the "generic" type... and for docker
| containers and OCi images that's not needed.
| LawnGnome wrote:
| The standard supports a repository_url "qualifier" (query
| parameter)[0], which can be used to override whatever the
| default registry is (which, for Docker, is hub.docker.com[1]).
|
| [0]: https://github.com/package-url/purl-spec/blob/main/PURL-
| SPEC...
|
| [1]: https://github.com/package-url/purl-spec/blob/main/PURL-
| TYPE...
| m4r71n wrote:
| You can use the `oci` package type for non-Docker images (or
| any OCI artifacts for that matter).
| heavenlyhash wrote:
| soo..... what's the guidance for when package names include a
| slash?
|
| such as approximately everything in golang, which very often
| matches e.g. "github.com/*" as a package name?
|
| Do would PURL suggest that "github.com/foobar/go-whatnot" should
| be parsed as namespace="github.com" (odd) and package name
| "foobar/go-whatnot" (since there aren't any more slashes in the
| blessed separators)?
| conradludgate wrote:
| I don't know, but I imagine those are actually the namespace.
| Eg I would imagine pkg:go/github.com/foo/bar@1.0.0 To be
| package bar in the github.com/foo namespace.
|
| The distinction doesn't really seem to matter much between
| namespace and name in all honestly.
| pombreda wrote:
| Agreed. In hindsight, I always wonder if this was a good idea
| to have this split. At least the namespace is optional and
| required only certain package types
| layer8 wrote:
| The canonical answer would be percent-encoding, so _pkg:golang
| /github.com%2Ffoobar/go-whatnot_.
|
| https://en.wikipedia.org/wiki/Percent-encoding#:~:text=accor...
| Joker_vD wrote:
| What's the guidance when URI paths include a slash?
| pkg:github.com%2Ffoobar/go-whatnot
| pombreda wrote:
| This is not a valid PURL as it is missing a type, assuming
| you wanted golang here.
|
| It could be instead:
| pkg:golang/github.com%2Ffoobar%2Fgo-whatnot
| pombreda wrote:
| Encode the slash as explained in the clarified spec
| https://github.com/package-url/purl-spec/pull/453 :)
|
| We are working on further clarifying Golang which a bit
| problematic: there is really no name or namespace in Go, just a
| path, and it is not possible at scale to tell when a Go module
| stops and when a Go package starts just by looking at the
| path... this is going to be clarified after the merge of the PR
| 453.
| pombreda wrote:
| Disclosure: I created that spec and we are working hard to
| clarify it and remove grey areas!
| dedicate wrote:
| Okay, so PURL is basically the thing that actually makes SBOMs
| usable for open source, not just a list of 'best guesses' with
| CPEs?
| pombreda wrote:
| That's actually the best explanation I have seen in a long
| time!
|
| - in most cases, no guesses needed - you can use it in Cyclone,
| SPDX, and CSAF and still talk about the same package even if
| the format varies - CVE.org is considering it as an addition on
| the same footing as CPE - there a good bunch of databases that
| "speak" PURL, like Google OSV, Sonatype OSS Index, Deps.dev,
| and AboutCode's PurlDB and VulnerableCode (disclosure: I am a
| lead maintainer for AboutCode FOSS projects) - most scanners
| speak PURL too.
|
| Note that same scanners and tools speak not exactly PURL but
| some "PURLish" dialect and we have a project to help streamline
| that and lift up the whole ecosystem of PURL users with
| https://nlnet.nl/project/purlvalidator/
| donenext wrote:
| Yes, 1000x yes
| alcroito wrote:
| I wish PURL proposed something sensible or at least usable for
| tracking C / C++ native libraries, that are NOT hosted on a
| registry like conan.io, or one of the linux distro registries,
| but is still (self-)hosted somewhere online.
|
| For libraries that are hosted on `github`, there's at least the
| github type.
|
| But there is no official `gitlab` or `git` type, and i've read
| comments that even the `github` type is considered a mistake.
|
| One example of such a library could be a Qt or KDE / Plasma
| library.
|
| They are hosted on their own forges, https://code.qt.io/ and
| https://invent.kde.org respectively.
|
| So to the more knowledgeable people out there, what is the PURL
| way of identifying a C++ library like that?
|
| Is `generic` type + vcs_url qualifier really the only way?
|
| Right now it seems impossible to track vulnerabilities for such
| libraries with OSS / open tools, because none of the open tools
| or databases support a custom type or registry or ecosystem.
|
| For example none of services here support some custom C++
| ecosystem (putting aside conan):
|
| https://docs.dependencytrack.org/analysis-types/known-vulner...
|
| Same for https://docs.dependencytrack.org/datasources/osv/
| donenext wrote:
| completely agree here `git` type using the namespace of your
| choice would be plenty to enable tools to find these packages.
| Even though its not "officially" supported in the spec this is
| what we do internally
| pombreda wrote:
| IMHO, a bare git stuff would be a git URL as specified in pip
| and SPDX and not a PURL... I would be interested to know more
| about your use case. Feel free to drop a note at
| pombredanne@aboutcode.org
| pombreda wrote:
| Note that there should be a gitlab type as it is planned for:
| https://github.com/package-url/purl-spec/blob/a90ee02679afc3...
|
| gitlab and github do provide package-like discoverability. Do
| you have a pointer that says a github package is a mistake?
| pombreda wrote:
| You wrote:
|
| > So to the more knowledgeable people out there, what is the
| PURL way of identifying a C++ library like that?
|
| That's a blind spot. This is a real problem for every as you
| rightfully explained.
|
| So I have been thinking a lot about how to track C/C++ native
| libraries, and I have been working on a plan to deal with this.
|
| You can read a summary there (that I just posted to supply this
| discussion!) - https://github.com/aboutcode-
| org/www.aboutcode.org/issues/30
|
| And this comment links to more detailed work-in-progress
| planning doc: - https://github.com/aboutcode-
| org/www.aboutcode.org/issues/30...
|
| If you want to chip in and help, this would be awesome.
|
| And IMHO, aligned with your thinking this should not be tied to
| a build system or a for-profit operation like conan.io, or a
| linux distro, or for that matter a specific build tool or
| approach as they are so many, and be self-hosted, easy to sync,
| and simple to store in a git repo.
| alcroito wrote:
| Thanks for the links! I hope the proposal works out. I
| skimmed through the doc, and one thing i'd suggest is to
| consider using the CPS format rather than the ABOUT one for
| the metadata. The format is driven by Kitware, the developers
| of cmake, and thus if it's contributed to them, a big chunk
| of the cpp ecosystem would get buy-in just because of the
| intertia of using cmake, and getting it for free with the
| tool.
|
| https://cps-org.github.io/cps/overview.html
|
| I'm not sure how I can help, but I'm open for discussion,
| because the company i work for is also interested in how to
| handle this well for our products.
| RS-232 wrote:
| I love PURLs, but the namespace attribute smells. It's way too
| arbitrary.
|
| What's the point of com.something.other? Why are we using dot
| notation when everything else is kebab case?
| pombreda wrote:
| Not sure I parse... do you mind to elaborate?
| pombreda wrote:
| Is this about Maven "groupid" mapped to a namespace?
| "com.foo.bar" is Maven's own invention and notation.... in
| most cases we are just trying to adopt the ecosystem
| convention to minimize fictions.
| quibono wrote:
| For all its expressiveness of the CPE format I find PURLs much
| easier to work with. Especially when it comes to software that
| doesn't fall neatly into the classic vendor/product split like
| what CPE envisions.
| pombreda wrote:
| Yeah, the CPE idea of a vendor for an open source package does
| not compute too well!
|
| FWIW, PURL came about as I could NOT put my mind around CPEs
| when I was scanning for package and deps with scancode and
| could not find any easy way to go from that to looking up a
| vulnerability/CVE in the NVD, as it was all guesswork and
| manual.
|
| So we started instead to put the vuln data in our own db, keyed
| by something that would be easy to relate from the scans. This
| eventually became PURL
|
| This is all tracked in these places: - The original issue:
| https://github.com/aboutcode-org/scancode-toolkit/issues/805 -
| The initial pull request with many comments:
| https://github.com/package-url/purl-spec/pull/1
| kdeldycke wrote:
| I have a project called Meta Package Manager that supports pURLs,
| so you can:
|
| $ mpm install pkg:npm/left-pad@1.2.3
|
| Other commands allows you to export the SBOM of all packages
| installed on your machine. More info at:
| https://github.com/kdeldycke/meta-package-manager
| pombreda wrote:
| This is awesomely nice!
| donenext wrote:
| Hot take, `generic` as a type is a crutch most tooling uses out
| of laziness and has significantly reduced the usefulness of PURL
| spec. How do we improve this?
| donenext wrote:
| Can we completely eliminate generic as a type to remove this
| crutch?
| pombreda wrote:
| All abstractions leak eventually, so we need that escape
| hatch IMHO. Otherwise you end up with the other issue which
| is that there are stuff you cannot track with PURL?
| jessoteric wrote:
| isn't the issue that sometimes a given scanner can't know from
| where the package is sourced?
|
| like if I'm scanning an arbitrary linux system, and I see
| `libssl.so.1` but I don't see it in the local package manager,
| I don't really have an option other than to call it generic.
|
| I do agree that "generic" seems to be WAY overused though.
| Maybe tools that report on SBOMs, like FOSSA or whatever,
| should emit warnings to users about "generic" PURLs.
| donenext wrote:
| Thats fair. It just seems silly that a spec intended to
| "uniquely ID a package" supports a type that is the complete
| opposite of "unique". I guess another way to frame my take is
| should `generic` be consider a valid PURL? Keep it as a fall
| back sure, but distinguish between "fully qualified" PURLs
| and "partial" PURLs.
|
| This then gives tooling a path to prompt users to provide
| missing context needed to fully qualify the PURL
| pombreda wrote:
| > distinguish between "fully qualified" PURLs and "partial"
| PURLs.
|
| Can you tell a bit more? Not sure I get what you meant
| jessoteric wrote:
| That seems like a good idea... hmm.
| pombreda wrote:
| > isn't the issue that sometimes a given scanner can't know
| from where the package is sourced?
|
| That's the problem: there is no metadata with or in
| libssl.so.1 that I can reliably use to tell what this is
|
| Eventually I can see a solution made of
|
| 1. create the metadata, say a simple YAMl or deb822 key-valud
| pair file that can then be included upstream or as an overlay
| 2. define a simple spec for binary formats to include a PURL
| (say in an ELF section or a WinPE string or sorts, where many
| of these are already stored) 3. create content-based tools
| like we have in PurlDB to match code, but may be more like a
| bunch of generated yara rules that would match symbols and
| strings from source to binaries and can recognize that
| libssl.so.1 is from OpenSSL 1.1.1g.
| pombreda wrote:
| Yeah, I added generic as an escape hatch, but this should be
| only used by exception, e.g., a crutch. An abused crutch.
|
| Eventually, let's fix this first for C/C++:
|
| https://github.com/aboutcode-org/www.aboutcode.org/issues/30
|
| And based on that approach we can either: 1. create new,
| sensible types as needed 2. and/or maintain a last resort open
| registry of generic types at least so we get some sanity in the
| process.
| zzo38computer wrote:
| In my opinion, there are some problems with this, such as:
|
| - The cryptographic hash is not included. (They do mention
| security, a hash and/or public keys would be helpful for
| security. It would also be helpful for identification if names
| are reused for unrelated reasons.)
|
| - There is not a distinction between interfaces and
| implementations (which in some cases you might care about,
| although not always).
|
| - They do not mention examples of what qualifiers are possible
| for some package types.
| pombreda wrote:
| Can you tell a bit more? What is this? The OP article?
___________________________________________________________________
(page generated 2025-06-05 23:00 UTC)