[HN Gopher] Transparent Telemetry for Open-Source Projects
___________________________________________________________________
Transparent Telemetry for Open-Source Projects
Author : trulyrandom
Score : 177 points
Date : 2023-02-08 13:19 UTC (9 hours ago)
(HTM) web link (research.swtch.com)
(TXT) w3m dump (research.swtch.com)
| zzzeek wrote:
| Was hoping for a big highly designed webpage with "enter your
| github URL here". but alas
|
| (it did say "transparent", like a service people opt into that
| could relate installations to github URLs)
| blibble wrote:
| how long until ads?
| philosopher1234 wrote:
| There's a lot of confusion in these comments about opt-out vs
| opt-in. The debate isn't settled, but a lot of the issues raised
| here have been addressed. Reposting Russ' comment:
|
| >Longer answer about opt-out generally, copied from mail I sent
| to golang-dev.
|
| > I wrote a little about this at
| https://research.swtch.com/telemetry-design#opt-out. Just to
| quote the beginning:
|
| "An explicit goal of this design is to build a system that is
| reasonable to have enabled by default, for two reasons. First,
| the vast majority of users do not change any default settings. In
| systems that have collection off by default, opt-in rates tend to
| be very low, skewing the results toward power users who
| understand the system well. Second, the existence of an opt-in
| checkbox is in my opinion too often used as justification for
| collecting far more data than is necessary. Aiming for an opt-out
| system with as few reasons as possible to opt out led to this
| minimal design instead. Also, because the design collects a fixed
| number of samples, more systems being opted in means collecting
| less from any given system, reducing the privacy impact to each
| individual system."
|
| > To elaborate, one of the core things I believe about designing
| a system like Go is that it needs to ship with the right
| defaults, rather than require users to reconfigure the defaults
| to get best practices for using that system. For example, Go
| ships with use of the Go module mirror (proxy.golang.org) enabled
| by default, so that users get more reliable builds out of the
| box. Similarly, Go ships with the use of the checksum database
| also enabled by default, so that users get verified module
| downloads out of the box. We know that most users don't want to
| and probably won't spend time reconfiguring the system: they
| trust us to set it up right instead. Of course, that implies a
| responsibility to actually look out for users' best interests,
| and we take that very seriously. There are important privacy
| concerns about the module mirror and about the checksum database,
| despite their clear benefits, so we designed those systems to
| address as many of those concerns as possible. Among the
| decisions we made to improve privacy there: (1) GOPROXY can proxy
| both the module mirror and the checksum database, (2) we
| published a very clear privacy policy (proxy.golang.org/privacy),
| (3) we introduced the concept of a tiled transparency log to keep
| log fetches from exposing a potential tracking signal.
|
| > Moving back to telemetry, enabling telemetry does not confer
| the same kind of direct benefits to users as the module mirror
| and the checksum database do. Instead the direct benefits it
| confers fall on other users: (1) allowing your Go installation to
| participate in the system means other installations participate
| just a little bit less, thanks to sampling, and (2) allowing your
| system to send usage information strengthens the signal from
| others with similar usage. There is still an important indirect
| benefit: one system opted out won't have much of an impact, but
| 99% of systems opted out has a huge impact, and that leads to
| mistakes like the ones I mentioned in the first blog post, which
| do make Go worse for you.
|
| > Like with the module mirror and checksum database, there are
| good privacy concerns to telemetry despite the clear benefits, so
| the design of transparent telemetry aims to address as many of
| those as possible. The bullet list in the GitHub discussion (also
| at the end of the blog post) enumerates the most important ones.
|
| > Most people leave defaults alone or make intuitive guesses
| about what they want. That's totally reasonable: no one wants to
| spend half an hour learning the details of each specific setting.
| But my goal for the system is that if I did spend half an hour
| explaining how the system worked, then the vast majority of users
| would agree with the default and see no reason to opt out. Of
| course, some people will always opt out on general principle, and
| perhaps there are others who would opt in to some systems but not
| this one. For those people, my goal is simply to make the opt-out
| as easy and effective as possible. That's why opting out is just
| an environment variable (GOTELEMETRY=off) or a single command (go
| env -w GOTELEMETRY=off), and there's a quiet period of at least a
| week after installation to give plenty of opportunity to opt out
| before there's any chance of data being sent.
|
| > I expect that this will not change your mind, and that you and
| a few others will still believe the telemetry should be opt-out.
| I accept that: I don't expect to convince everyone about this
| point. But I hope this helps explain how I am thinking about the
| decision.
| silisili wrote:
| Very much against this. Sure, it sounds naive enough, and can
| give reasons why. But I have 3,436 items in /usr/bin. What if
| -every- one of these started doing their own telemetry, their own
| envvars, etc?
|
| If we have to deal with telemetry, then I'd instead hope that
| there can exist a single telemetry systemwide interface. Not sure
| how that would be designed or implemented, but would be better
| than everyone doing their own bespoke thing. Plus easier for me
| to disable them all in one go.
| arccy wrote:
| maybe the problem is why do you need 3436 binaries there?
| jodrellblank wrote:
| > " _What if -every- one of these started doing their own
| telemetry, their own envvars, etc?_ "
|
| What bad thing are you suggesting would happen if they did?
| Your computer and internet connection can't handle four
| thousand strings or four thousand HTTP POSTS, or four MB more
| disk space of telemetry libraries? I bet it can. This isn't a
| technical problem, it's a control and consent problem.
| Macha wrote:
| I wouldn't be so sure it's not also a technical problem, the
| limit of execve can be as low as 128kb which for 4,000
| strings gives a maximum of 32 characters per NAME=VALUE
| environment value
| jodrellblank wrote:
| Free disk space can be as low as zero, but we don't blame
| the tool makers for adding an extra 100Kb or 20MB, we blame
| the computer owner for not having enough disk space to
| install the thing they chose.
|
| Wrapper scripts for every utility to do
| UTIL_TELEMETRY_OPT_OUT=1 util ...
|
| so they don't need to be set all at once.
| JohnFen wrote:
| > we don't blame the tool makers for adding an extra
| 100Kb or 20MB
|
| Honestly, I do. Code bloat is a real thing.
| schmichael wrote:
| > the vast majority of projects, even large ones that would
| benefit, stay away from telemetry.
|
| Nomad is one of these projects. We support a dizzying array of
| platforms (32bit Intel Linux?!). We have no idea how popular our
| Consul service mesh integration is. Are bug reports a sign of use
| or just failed experiments? Is anyone running on macOS in
| production or just ephemeral dev agents?
|
| Surveys about this are just asking humans to do something
| computers can do better.
|
| Obviously privacy and consent are paramount concerns, but not
| only are they solvable, in open source they're fully auditable
| (and a fork could fairly easily maintain a patch that removes it
| outright).
|
| I think open source largely rejecting telemetry puts it at a huge
| disadvantage to proprietary and SaaS software where it is the
| norm. I'm very excited to see someone as thoughtful and well
| reasoned as Russ Cox to be trying to move the status quo forward.
| wrldos wrote:
| This is an incorrect assertion.
|
| We have to ask for permission on our SaaS products to collect
| this data as it's not necessary to collect it for the product
| to function. The EU GDPR mandates this.
|
| Russ Cox is suggesting that there is no permission step and
| that the data is collected by default.
|
| That is the issue.
| ergonaught wrote:
| Everything involves tradeoffs.
|
| The times "we" (previous companies) tried to implement
| telemetry in open source non-SaaS products (as distinct from
| "projects"), we either got huge blowback or users/customers
| simply blocked it at the firewall (and security teams at major
| enterprises were unwilling to open holes anyway).
|
| The only workable solution I found was integrating this in a
| value-add way, so that something in the service/experience/etc
| was better for the user/customer as a result of enabling
| telemetry, without the dark pattern of making things
| intentionally awful/worse without it. We simply never got
| enough data to matter otherwise. But, again, that was products
| and not projects.
| smoldesu wrote:
| On the contrary, I'd argue that the tracing visibility you're
| looking at isn't inherently a software trait at all. It's a
| deployment feature, which is something you address at-cost when
| building a product, but almost never when building FOSS
| software. It's not that people in FOSS don't see that upsides
| to it, it's that those upsides are insignificant relative to
| the cost of sustained market research. It's easier to just...
| make stuff, and have companies plaster over the gaps when their
| interests align.
|
| Look at GNOME, which recently pushed for it's users to
| contribute telemetry: https://linuxiac.com/gnome-survey-
| results/
|
| Nothing wrong with what they've done here, but we already had
| most of these metrics. Nothing was really learned, and it took
| Red Hat and a few thousand users to get here. For smaller-scale
| projects, imagine how much smaller the returns would be.
| bioemerl wrote:
| Honestly, this may be unpopular with hacker news, but just add
| your own telemetry. If people don't like it they can turn it off,
| and telemetry is essential for a good product.
|
| Do let people turn it off though please.
| msla wrote:
| > telemetry is essential for a good product
|
| No, it isn't, and the idea that is is toxic.
| bioemerl wrote:
| Essential is probably not right, this is true.
|
| But I'm confident I'll make a better product with telemetry
| vs without.
| pcthrowaway wrote:
| I think maybe even making telemetry mandatory with an open
| license, and customizable with a support license might be a
| sustainable way to run an open source company.
|
| For many open source project (anything with an attached
| business model), either telemetry or tight communication around
| usage patterns will be necessary to inform development. The
| latter of those two options consumes business resources.
| bioemerl wrote:
| Mandatory with an open license might shoot yourself in the
| foot when people fork your code.
|
| For me, the harshest you can go is to have telemetry and not
| prompt for the setting on start with the user opted in by
| default. Ideally, you ask on first load, and that's what I'd
| probably aim for.
|
| You can't ever try to force people to do anything in open
| source. They'll run right by you and make it do what they
| want it to do with or without you.
|
| I'd even imagine paying customers are broadly more happy with
| telemetry than open source ones. And their needs more
| important anyways.
| groestl wrote:
| > If people don't like it they can turn it off
|
| If I, perchance, encounter software I use phoning home without
| my explicit permission it's done on my systems. Period.
| Spivak wrote:
| Well according to our telemetry 0% of users turn it off so it
| seems pretty popular.
|
| But more realistically what you gain in privacy you give up
| in having your voice heard by the devs. The decisions about
| the future of the product/project will be driven by the data,
| specifically the data from the kind of people who leave
| telemetry on.
| groestl wrote:
| See, I _am_ a dev. I run telemetry on my infrastructure, I
| analyse it and fix what's broken, and if necessary, try and
| get upstream fixed. Also, I'm not opposed to telemetry in
| general, but if a switch like this is turned on by default,
| trust is broken for good.
|
| Software which does any type of computing without it's
| user's informed consent is classifiable as malware, mind
| you.
| JohnFen wrote:
| If 0% of your users disable it, that kind of screams
| there's something wrong with your opt-out mechanism. Is it
| broken? Hidden? Difficult to do?
|
| I mean, with any group of people, there will always be a
| percentage that will disable it. If the telemetry is
| popular, that percentage might be very small, but it would
| be non-zero.
| groestl wrote:
| Well, those that turned it off are not phoning home, so
| the rest will be 100% ;)
| jwilk wrote:
| I think you missed the joke.
| bioemerl wrote:
| Oh shit, I missed it as well.
| JohnFen wrote:
| Oh! That's what the whooshing sound over my head was!
| bioemerl wrote:
| That is fine, but in this case telemetry trades you (and
| other more hardline users) as a user for all the extra users
| you gain from instant crash reports, quick feedback, and
| generally better productivity.
|
| I would never personally make that trade-off, and would
| always put (disableable) telemetry in my projects.
| Lammy wrote:
| > telemetry is essential for a good product
|
| Up until now, you've had to make these design decisions on your
| own, relying only on perplexing intangibilities like 'taste'
| and 'intuition'.
| bioemerl wrote:
| I think it's somewhat silly to fly blind on the assumption
| that your taste is better than any real world observations
| you can make.
|
| Especially if you haven't had the chance to develop an
| intuition yet and are new to the field. Without data to
| correct you, how do you get better?
| waboremo wrote:
| Those design decisions were never made in a vacuum. They
| relied on telemetry (or, less scarily, user testing, user
| feedback, user research, etc) to figure out what works best.
| As a reminder, our intuition doesn't come from nowhere,
| rather centuries of survival and expectations. If you do not
| know what these expectations are, and if you do not know how
| your users interact with your product based on these
| expectations, you cannot make a good product. Certainly, you
| can make a product that appeals only to you, but how many of
| yous exist?
|
| The design of a teapot is a great one. It didn't magically
| appear with handles and a spout and a place to hold leaves,
| but after years of refining based on usage. As shocking as
| that is, tea wasn't even discovered on purpose, let alone
| having a specific vessel for it right out the gate.
|
| So yes, telemetry is essential. Taste is personal.
| gavinhoward wrote:
| @rsc, if you ever see this, your proposal here means that I will
| never use any software written in Go ever again, if at all
| possible.
|
| What others have said in this thread about telemetry becoming an
| "accelerant" will happen. Abuse will happen. Data will be put up
| for sale. IP's will be logged because users can't verify that
| they're not.
|
| The only thing users can verify is what is sent and to whom. And
| only if they run packet inspection. Most users don't.
|
| (Edit: I just realized that users may not even be able to tell
| who data is sent to because of proxies or the original collector
| selling the data.)
|
| I have no reason to believe your personal motives are anything
| but pure; however, this capability will not just be in your
| hands. It will be in the hands of anyone with less-than-pure
| motives.
|
| I applaud your efforts to make telemetry more transparent, but
| they are destined to fail.
|
| When it comes to figuring out how users use software, the only
| thing to do legwork. Ask your users. Watch them if they'll let
| you do user studies. Pay non-users to use the software for a user
| study and put them through all situations, including rare ones.
|
| This is the same thing we programmers tell the police to do when
| the police whine about end-to-end encryption: do old-fashioned
| legwork. Why should we, as programmers, demand that of police
| when we give ourselves tools to violate the privacy of users in
| the exact same way that police want?
|
| Yes, that's right, the _exact_ same way. Telemetry is a backdoor
| on a private conversation between a user and a machine.
|
| Just do the work. I'm pretty sure Google has the money to do so.
|
| You may respond that this is for Open Source developers to get
| data on their users. Well, if those developers are hobbyists,
| they don't have time to crunch data, and they're probably
| scratching an itch. If they are not hobbyists, they are paid and
| should do the legwork.
|
| There is _no_ excuse for telemetry. Just do the work.
| [deleted]
| icholy wrote:
| What world do you live in?
| gavinhoward wrote:
| A custom-built Gentoo that uses the Awesome Window Manager
| for a minimal install, builds Firefox from source, and uses
| OpenSnitch to sniff _everything_.
|
| My machine is locked down hard.
|
| Oh, and I checked what depends on Go on my machine. The one
| kicker was libcap, which won't depend on Go if I tell it not
| to build captree. So I did that.
|
| I uninstalled Docker.
|
| That leaves:
|
| * `arduino-builder` (for my custom keyboard).
|
| * Hugo (for my websites).
|
| * An unnamed program.
|
| * Gitea.
|
| Besides `arduino-builder`, I have a plan to get rid of all
| three of those. For two, Hugo and Gitea, I had already
| planned to. The unnamed program is harder, but someone has
| already done one. Unfortunately, it's in Go, so I'm going to
| have to do something else myself.
|
| `arduino-builder`, though, that's tough.
| Thaxll wrote:
| All browser have telemetry in them, but he probably used
| netcat to post that message.
| userbinator wrote:
| Nope, nope, and more nope. You're not moving the Overton Window
| any more on me.
|
| In fact it seems there's a clear correlation between the quality
| of software and how much spyware there is embedded in it. It's
| often merely another way to justify unpopular changes with "but
| the data says so".
|
| IMHO if you want to collect any information, it should never be
| anything but opt-in, a conscious decision.
| geodel wrote:
| Well, its not a humble opinion but a very strong one which is
| fine since you want certain thing in certain way and nothing
| else will do.
| oneplane wrote:
| So how would you design a well-working system to make data-
| driven decisions rather than guesses? Most collection methods
| are notoriously bad, but partial collection is also bad since
| we now have to somehow put a weighing factor on presumed absent
| data, which turns choices into guesses again.
|
| I think this is a really hard problem, and simply trying to
| guess in the dark as to what people want isn't the smartest way
| to go about finding the path forward / priorities /
| improvements / defects.
|
| It also isn't something that we had in the past, because when
| we used to buy the IDE, buy the compiler, and then build
| software, sell that software, and let everyone know what cool
| tools we used, you'd have sales figures that would inform the
| creators how the tools were used. Now, the tools are available
| to everyone, anonymously, and everyone has an opinion on how
| well it works for them, but doesn't have the time to write a
| well-written report every time a release happens.
| JohnFen wrote:
| Telemetry or no, a certain amount of guessing is inescapable.
| mananaysiempre wrote:
| > IMHO if you want to collect any information, it should never
| be anything but opt-in, a conscious decision.
|
| Serious (general) question: How do you do that given a non-
| technical user population? Debian's opt-in popcon kind of
| manages to get a little bit of data from a fairly technical
| one, but nowhere near enough to estimate a low usage frequency,
| and it's the only opt-in program I'm aware of that gets
| anything usable at all. Given that I'm unwilling to implement
| an opt-out system, I don't really see a workable approach here
| at all.
| autoexec wrote:
| How many people are installing and using open source software
| but couldn't understand a pop-up explaining what data is
| collected and asking if they'd like to submit it? Is the non-
| technical nature of the user the problem or is it just that
| when you have an opt-in option most people make the choice to
| opt-out? That's the thing about respecting users by giving
| them choice, they get to say no. If they mostly say no, and
| you don't get enough data, that's the will of your users and
| therefore not really a problem.
| mananaysiempre wrote:
| > How many people are installing and using open source
| software but couldn't understand a pop-up explaining what
| data is collected and asking if they'd like to submit it?
|
| I've taught probability theory using randomized response[1]
| as an exercise problem, and while people can understand it
| given time and motivation, it's not immediately obvious. So
| I'm not exactly hopeful that a prospective Audacity,
| Blender, or even Free Pascal user (to take an arbitrary set
| of examples) would get what I mean if I say "I'm collecting
| no more than 10 bits of information about you using
| RAPPOR"[2], and I'm not willing to engage in comforting
| bullshit such as "all collected data is anonymous", as I've
| been all too close to situations where the difference
| between the two might be one between freedom and prison.
|
| > Is the non-technical nature of the user the problem or is
| it just that when you have an opt-in option most people
| make the choice to opt-out?
|
| Both, because confirmation dialogs, especially privacy-
| related ones, have been thoroughly poisoned in users'
| minds. But confirmation of obscure actions, however
| beneficial their consequences, is problematic in general--
| if I go on the street and ask people if they'd like
| caffeine in their tea or ascorbic acid in their apples, I
| expect (but have not checked) that the majority will say
| no, nevermind that both are normally there and intrinsic to
| the experience.
|
| (The possibility of meaningful consent from a non-
| specialist is the subject of much discussion and few good
| answers in med school, or so I've heard.)
|
| Whether the ultimate answer is to grant or deny permission,
| I'm not sure I can present the question in a way that will
| actually have it made on the basis of merit and not on
| "scary permission dialog, better say no" or "yes, yes, just
| let me through to my dancing bunnies[3]" or "yes, if I say
| no the installer will just tell me to GTFO".
|
| (In that respect the "Send crash report to vendor" button
| is unexpectedly good, because you're not actually
| interposing yourself between the user and any prospective
| bunnies. But personally I don't like to spend time and
| effort in order to send "feedback" into an unmarked hole
| where I've no idea if anybody will ever look at it. From
| that point of view, it is background data collection that's
| unexpectedly good.)
|
| And even if, for the purposes of this question, it would be
| best if people took the time to learn the necessary maths,
| computing, and operational security to make an informed
| choice, in reality I'm not sure that's the best thing they
| can spend their life on.
|
| So it may be the answer is that you simply can't do
| telemetry well for the social reason that users won't ever
| end up making an informed choice, or that the well has been
| poisoned so thoroughly that the rational choice is to
| reject everything. It's just that I know that it's
| basically possible in a technical sense, so I don't want to
| give up that easily.
|
| [1] https://en.wikipedia.org/wiki/Randomized_response
|
| [2]
| https://blog.cryptographyengineering.com/2016/06/15/what-
| is-...
|
| [3] https://blog.codinghorror.com/the-dancing-bunnies-
| problem/
| JohnFen wrote:
| What I hear you saying here is that people don't do what you
| want if you give them the choice, so you lean towards not
| giving them the choice rather than respecting their wishes.
|
| Is my interpretation correct?
| mananaysiempre wrote:
| >> I'm unwilling to implement an opt-out system
|
| > [Y]ou lean towards not giving them the choice rather than
| respecting their wishes.
|
| > Is my interpretation correct?
|
| I don't think it is, no :) Rather, I'm not sure how to
| sell, to put it crassly, users on a choice when properly
| investigating or even being confronted with that choice
| would delay them seeing the dancing bunnies[1], but that
| would also, if I have any say about it, improve the bunnies
| in the future.
|
| Does that mean there's a shade of "I know better" in my
| problem statement? Of course it does, if I didn't know
| better than the average user I'd have no business designing
| such choices. I don't think there's anything wrong about
| that, better than the average at an activity few practice
| is not a terribly high bar. Not giving the users a choice
| or manipulating them into making the one I think is right
| would absolutely be wrong, though.
|
| Basically, how do I make the user think, how do I give them
| the appropriate data to do so, and how do I deal with the
| obvious contradiction of that goal with principles of good
| design[2]? The potential benefits to the software and
| (thereby) the users are too much to give up without even
| asking those questions.
|
| (See nearby comment for extended discussion.)
|
| [1] https://blog.codinghorror.com/the-dancing-bunnies-
| problem/
|
| [2] https://sensible.com/dont-make-me-think/
| JohnFen wrote:
| Thank you for the thoughtful response. We disagree on
| much, but I respect your opinion nonetheless.
|
| > Not giving the users a choice or manipulating them into
| making the one I think is right would absolutely be
| wrong, though.
|
| I'll pull out just this point, though, to perhaps
| illustrate how different our worldviews are. I consider
| opt-out to be a manipulative approach.
| mananaysiempre wrote:
| > I consider opt-out to be a manipulative approach.
|
| So do I, which is why I wrote I'm unwilling to implement
| it :) The original (and, to be clear, purely theoretical)
| point was, opt-out is too manipulative while opt-in is
| likely useless.
|
| Ah shoot. Did you take that to mean that I'm unwilling to
| implement an off switch at all? That wasn't it, sorry for
| the confusion.
| JohnFen wrote:
| Perhaps we aren't so far apart after all.
|
| The struggle is real. As a developer, more data is
| obviously desirable and can make development much easier.
| I just can't think of a way to do telemetry that, if I
| were a user, I would accept. And I don't want to produce
| software that I wouldn't personally use.
|
| I just don't know how to have my cake and eat it too.
| rom-antics wrote:
| Ask for consent during setup or on first run. Syncthing does
| this and they get plenty of usable data. It's even public:
| https://data.syncthing.net/
| caniszczyk wrote:
| We need solutions in this space for open source projects, I've
| been monitoring https://divviup.org as an option too!
| ddevault wrote:
| This is not okay. The only ethical way to do telemetry is _opt-
| in_. If not enough people are opting in, you need to incentivize
| them to -- most simply by just paying them for their data. After
| all, telemetry is "valuable", isn't it? But if you can't figure
| out how to convince people to opt-in, then tough luck, sucks to
| be you.
|
| Opt-in or GTFO, Google. I'll be patching this out of the Alpine
| package for Go the day it ships.
| gavinhoward wrote:
| You and I may not agree on a lot, but I sure agree with you on
| this one.
| mordae wrote:
| I dunno. It sure makes sense to me to collect telemetry from free
| software installations, but I feel that having every platform or
| even piece of software to do it on its own with opt-out will
| inevitably lead to people being overwhelmed and angry.
|
| I would, personally, prefer a single non-profit service that
| would list publicly what is being collected and publish the
| results as open data for anyone to use. Applications (at least on
| Linux) would not submit their reports directly, but would use a
| local relay service that could be turned off completely or that
| could filter what reports to send to the server and what to
| /dev/null.
|
| Distributions and other software stores would then make it
| mandatory for software to use this relay and either patch out any
| other telemetry from their packages or straight out forbid those
| that would not comply.
| gen220 wrote:
| I think the issue of telemetry is fundamentally a human issue
| of incentives and trust. The system you describe is wise
| because it recognizes this and attempts to address it.
|
| The difficulty with telemetry is that even if we design the
| perfect, privacy-preserving system to begin with, once the
| pattern of having a network port open is established, there's
| nothing to prevent us (humans) from changing our policies about
| what we're allowed to push/pull over that port.
|
| In real-world analogues for these kinds of thorny policy
| problems, we have centralized arbiters to solve these problems.
| That might be a fruitful course of research for people
| interested in this problem to explore.
|
| Unfortunately, even though this problem has software as its
| medium, it is a problem that cannot be solved by clever
| software alone, despite any appearances to the contrary.
| JohnFen wrote:
| > The difficulty with telemetry is that even if we design the
| perfect, privacy-preserving system
|
| The other difficult is what you mentioned: trust. Even if a
| piece of software really does telemetry in a perfect,
| privacy-preserving way -- as a user, I have to take the
| developer's word for that in the end. That's a hard hurdle to
| pass, because that trust has been violated so much in the
| past that nobody gets the benefit of the doubt anymore.
|
| > Unfortunately, even though this problem has software as its
| medium, it is a problem that cannot be solved by clever
| software alone
|
| I agree entirely. At the heart of it, this is not a
| technological problem. It's a human one.
| colesantiago wrote:
| And there it is. The real intentions of Google and the the Go
| Programming Language.
|
| Google really can't help themselves, to stick telemetry in
| anything.
| [deleted]
| [deleted]
| kardianos wrote:
| This is well done. It only exposes counters, and rather then
| pushing data up, the telemetry server must know the names of what
| it can ask for. No wildcards.
| [deleted]
| [deleted]
| bombela wrote:
| The information and rate of upload as described seem reasonable.
|
| Is the fear from most people that it will be a foot in the door?
| And a way for Google to collect extra overtime?
|
| Note: I think Go is a regressive technology. That would have been
| great in 1970s. Not today. But that's a different topic. My point
| is that I tend to be biased very negatively against Go. But here
| I don't see something wrong.
| throw7777 wrote:
| Interested as to why you believe go is regressive, could you
| expand on that?
| 1vuio0pswjnm7 wrote:
| "The system is on by default, but opting out is easy, effective,
| and persistent."
| Arnavion wrote:
| I haven't worked with golang in some time. How do golang devs
| generally obtain the compiler?
|
| If you're getting it from distro repos, it should be
| straightforward to convince the distro package maintainer to
| disable the telemetry / patch it out.
|
| Or is it a nvm/pyenv/rustup situation where you prefer to use
| bespoke toolchain managers to download upstream's compilers?
| aliyeysides wrote:
| It depends on which compiler you want to use, but precompiled
| binaries include the gc compiler by default. If you want to
| compile from source yourself, you can use gc or gccgo assuming
| you already have the go toolchain, otherwise you would need to
| bootstrap from an existing binary.
| badrequest wrote:
| I mainly get it straight from golang.org, but this will be able
| to be disabled via environment variable just like the modules
| proxy stuff was. https://research.swtch.com/telemetry-
| design#opt-out
| sidewndr46 wrote:
| Running the command in that just shows me a message of "go:
| unknown go command variable GOTELEMETRY"
| 4ad wrote:
| Because this proposal has not yet been implemented.
| sidewndr46 wrote:
| So is there a way to disable this ahead of time? Or do I
| have to install the version with telemetry first?
| mseepgood wrote:
| Yes, you can set an environment variable at any point in
| time.
| rsc wrote:
| You can also echo GOTELEMETRY=off >>
| $(go env GOENV)
| creepycrawler wrote:
| If they add "telemetry" my response would not be to set an
| environment variable, but to uninstall golang. I used it a few
| years ago, both personally and in a work setting, but I'll do so
| no more in the future. Just my opinion.
| xyzzy_plugh wrote:
| I'm usually against telemetry but not only is the approach here
| somewhat reasonable, I think I actually trust Google more than,
| say, homebrew to not do something egregious with the data.
|
| Google is at least as broadly compliant as one can be with
| various standards (of questionable value, natch) but is also on
| the hook socially and perhaps legally if they fuck this up.
| bluehazed wrote:
| > I think I actually trust Google
|
| shilling as transparent as russ cox' telemetry
| photochemsyn wrote:
| This is perhaps unintentionally amusing:
|
| > To be clear, I am only suggesting that the instrumentation be
| added to the Go command-line tools written and distributed by the
| Go team, such as the go command, the Go compiler, gopls, and
| govulncheck. I am not suggesting that instrumentation be added by
| the Go compiler to all Go programs in the world: that's clearly
| inappropriate."
|
| Well that dispels any lingering thoughts I might have had about
| ever using golang for anything (not many to be sure). Someone
| feels the need to assure everyone that they won't be stuffing
| telemetry code into every binary their compiler produces? Google
| just wants all the data about everyone everywhere all the time...
|
| https://www.komando.com/security-privacy/ways-google-invades...
| cube2222 wrote:
| Probably related to[0].
|
| To anybody complaining that this should be opt-in: opt-in
| telemetry doesn't work. The reason for this is that most people
| don't care, but they don't care either way. They don't disable it
| when prompted, nor would they enable it manually.
|
| The idea of telemetry is being able to prioritize the work that
| will be most widely useful. For this you need a good and balanced
| sample of your users. You don't really get any kind of sensible
| sample if you only do it opt-in. Additionally, this ship has long
| sailed, everybody does opt-out.
|
| What I do think however, is that it should very clearly notify
| the user of this, and give them an easy way to disable it. Like
| in OctoSQL[1] (disclaimer: which I'm the author of) which prompts
| you on first run and shows explicitly how to disable it.
|
| All things considered, this is an open source project, so you're
| free to maintain a fork without telemetry. The Go toolchain also
| uses the Google-hosted module proxy by default, which really is a
| bit like telemetry already.
|
| [0]: https://news.ycombinator.com/item?id=34707583
|
| [1]: https://asciinema.org/a/eWQsyXQKi1fmithyTekAD5fWS
| wrldos wrote:
| But that's wrong. There is no position for this in a civilised
| society:
|
| _" If we ask everyone is going to say no, so we will steal it
| unless someone tells us not to"_
| BonoboIO wrote:
| I think the comparison of telemetry and stealing is pretty
| harsh.
|
| Is opt-out telemetry unethical ... depends. If you use it in
| a privacy preserving way no, if you spy on your Users, sell
| the data for money or advertising obviously it is unethical.
|
| The hard truth is, nobody reads the manual. Opt in telemetry
| is often a minority, and you then work with niche data for a
| minority that influences your development in certain ways.
| wrldos wrote:
| But is that not the decision of the person who owns the
| data?
| JohnFen wrote:
| It really all boils down to meaningful consent.
|
| > if you spy on your Users
|
| In my opinion, any data collection about me or my machines
| that occurs without my active informed consent is "spying".
| This is my fundamental problem with opt-out mechanisms.
| They do not indicate or imply that active consent was
| obtained.
| [deleted]
| Terretta wrote:
| > _What I do think however, is that it should very clearly
| notify the user of this, and give them an easy way to disable
| it._
|
| You make a good point.
|
| As a for instance from a popular Mac-based package manager that
| (unexpectedly for many) defaults to telemetry from your CLI:
|
| `brew analytics off` is not hard to type after installing
| homebrew, but the installation text doesn't mention that;
| instead it points to a web page you have to read about how
| wonderful the analytics are before eventually finding the
| incantation:
|
| https://docs.brew.sh/Analytics
|
| I wonder how many people care enough to click that link, read
| all the "analytics are actually good for you" copy, and then
| change their mind to leave it on. I'm guessing almost zero?
|
| But perhaps most users won't cut and paste the link, where if
| it just suggested `brew analytics off` many users would type
| it.
| marginalia_nu wrote:
| > The idea of telemetry is being able to prioritize the work
| that will be most widely useful.
|
| It does sort of hinge on the highly suspect assumption that
| usefulness is correlated with use. An obvious counter-example
| to this is something like a fire-extinguisher, which will in
| the ideal case just sit on a wall until it's use-by date passes
| and then it's discarded having never been used; or on the flip
| side, an incredibly byzantine workflow that could be reduced to
| something much simpler will appear important and useful.
|
| Even without these edge cases, interpreting statistics is
| really hard. Like people with PhDs who have studied these
| things for years still get them wrong all the time.
|
| What ends up happening more often than not is it's used as a
| tool to quiet the critics when pushing through unpopular
| changes.
| cube2222 wrote:
| All this boils down to "an unskilled engineer will
| misinterpret data even if they have it". I'll assume the Go
| team knows what they're doing, based on their track record so
| far.
|
| There's a lot of very simple questions you can answer very
| reliably, too, like "what proportion of the users are still
| using a certain compatibility flag".
| masklinn wrote:
| > I'll assume the Go team knows what they're doing, based
| on their track record so far.
|
| Funny, I'd assume the exact opposite. After all much of the
| understanding of privacy and statistics at scale was
| developed after 1980.
| marginalia_nu wrote:
| My point is that scientists whose job is to interpret data
| and construct experiments gets this wrong on a regular
| basis, despite years of training in constructing
| experiments and interpreting data, despite peer review,
| despite staking their career and reputation on not making
| these kinds of mistakes. They still happen! A lot!
|
| Interpreting data is very hard.
| nine_k wrote:
| Most software features are not like fire extinguishers.
|
| More than that, the interesting stats may be not even around
| user-visible features, but around internal mechanisms, like
| some cache hit rate, or how often is some branch in the
| compiler invoked.
|
| As long as stats are clearly inspectable, reasonably
| anonymized, and are opt-out, I'd be fine with sending them.
| marginalia_nu wrote:
| The features that are like fire extinguishers are the ones
| most likely to be unjustly removed with the rationale of
| looking at telemetry.
|
| See for example Mozilla's bizarre decision to remove the
| ability to change the override the character encoding of a
| webpage with some half-baked detector.
| bitwize wrote:
| There are essential, fire-extinguisher-like features. The
| canonical example is the joke about backup software: if it
| were developed according to today's standard of telemetry-
| driven engagement analytics, the restore from backup
| functionality would be removed because it's used so
| infrequently.
|
| This actually happens sometimes: when developing the demo
| ".kkrieger", a first-person 3D shooter in 96 KiB, demogroup
| theprodukkt tried to shrink it down to get it under the 96
| KiB wire. One of the tricks they used was using a profiler
| to identify code sections that were never reached and could
| be removed. One of the sections they removed was the
| handler for the up arrow key in the main menu, simply
| because the test player never pressed up in the menu.
|
| If you think that Google or another large software
| organization won't misuse telemetry by cutting or
| neglecting important but infrequently used functionality to
| hit some KPI... have you ever _worked_ in a large software
| organization?
|
| All stats can be deanonymized. The more data you make
| available, the more you identify yourself. I do not need
| software I use stealthily tying up bandwidth by "phoning
| home" with data about me. It is simultaneously betrayal and
| resource theft. If I wanted to contribute to the
| improvement of the software, I'd file a bug report.
| nine_k wrote:
| I think your view of the ways usage stats are used is a
| bit simplistic. Not everyone remove "underused" features
| without giving some consideration, even in big
| corporations.
|
| But since you clearly don't like telemetry, you should
| have a way to reliably switch it off. Here we are on the
| same page: there must be a well-documented and easy way
| to switch any telemetry off.
| BonoboIO wrote:
| ,, Most software features are not like fire extinguishers.
| ,,
|
| Amen
|
| Sometimes some pretty niche feature, that is used by one
| individual that is loud, gets more attention than a
| feature, used by thousands or millions that are just silent
| users.
| omginternets wrote:
| So it sounds like we can't have telemetry?
| saurik wrote:
| The argument for this being opt-in isn't about "it works
| better", it is about it being ethically correct. There are a
| ton of things that "don't work" unless you do something
| unethical: that doesn't mean they are OK, it doesn't mean they
| should be tolerated, and it doesn't mean the people who do them
| --and, at the end of the day, it is people who make these
| decisions: there is a human being who refused to say "no" and
| whose name we might even be able to find out--shouldn't be
| judged by their peers for doing these things... and that is all
| true even if it is (currently) legal for them to do it!!
| rcme wrote:
| What is the argument for opt-out telemetry being unethical?
| masklinn wrote:
| Because that goes against informed consent.
|
| Opt-out is generally rejected by European privacy laws.
| remus wrote:
| > Opt-out is generally rejected by European privacy laws.
|
| ...where personal data is involved.
|
| It strikes me that this proposal goes to considerable
| lengths to avoid collecting anything that could be
| considered personal data.
| _ph_ wrote:
| Because it most likely means that people are sending data
| without their consent. Perhaps I am naive or just very old,
| but I wouldn't expect a compiler to "phone home" with
| information about what I do with it. Certainly not without
| me expressing consent first.
|
| So if you want that information, find a way to ask the user
| first. If you can give a good and understandable
| explanation on how the information is useful, the users
| might give their consent happily.
| delusional wrote:
| I don't think it's all telemetry. I suppose telemetry could
| be designed in a way that preserves the users privacy to an
| extent that is compatible with their native assumption. I
| suppose that design also depends on what you're building.
|
| If you're building a website, I think it's fairly
| reasonable for you to store my IP. That's inside my
| expected privacy loss when dealing with a remote party. I
| have to connect to your computer, much like i have to
| physically walk into a store. I don't mind you remembering
| that I was there. Running a compiler on the other hand
| feels more "private" to me somehow. My expectation when
| using a compiler is that it won't send anything to anyone,
| because why would it?
|
| In general I think our industry is starved for relevant and
| foundational ethics research, outside of the FSF at least.
| cube2222 wrote:
| You're framing this as though the "ethical" choice were
| obvious, or that there was a person who "knew this was the
| ethical thing to do, but turned a blind eye".
|
| I disagree, I think it's a very contested topic, with lots of
| discussion whenever it's raised here, with either side
| possibly being a vocal minority.
| atahanacar wrote:
| [flagged]
| philosopher1234 wrote:
| This is a very extreme position. It's hard to take you
| seriously when your comment doesn't have any nuance.
| atahanacar wrote:
| Finding the collection of a person's data without consent
| unethical is not an "extreme position". Since when
| "consent" or more correctly "autonomy of individual" is
| called "extreme"? If you did the same thing in my field
| (medicine), you would lose your license.
| philosopher1234 wrote:
| I agree, that's not what I think was extreme about your
| position. I think you've invented straw men in this
| comment and your previous comment.
| atahanacar wrote:
| Reading your comment again, I can see it now. I
| misinterpreted "knew this was the ethical thing to do,
| but turned a blind eye" as "knew there was an ethical
| problem, but turned a blind eye".
|
| Turns out, my straw man can't read.
| philosopher1234 wrote:
| Props to you for saying so publicly! I'm not sure if
| you're unusually open or if I just found the right words
| to persuade you, but this is a first for me :)
| JohnFen wrote:
| I agree that opt-out is a Bad Thing, but I disagree with
| this stance. And I think lots of people in the pro-
| telemetry camp see that there's an ethical issue to be
| discussed, but they reach a different conclusion. They
| shouldn't be dismissed so glibly.
| atahanacar wrote:
| Reaching a different conclusion is one thing, but not
| seeing a dilemma is another. One can always argue that
| invading a person's autonomy might be necessary given the
| benefits but seeing no issue is just turning a blind eye.
| JohnFen wrote:
| In the Golang announcements, it's clear that they
| completely see and understand the dilemma, and have
| provided a lengthy explanation of why they decided for
| opt-out anyway.
|
| I respect that. I don't agree with the decision, but it
| was made with understanding and thought.
| atahanacar wrote:
| I made my original comment misunderstanding what the
| parent comment meant as "not knowing an ethical problem
| exists". I also am not talking about this specific
| decision, but criticizing ethical decision making in the
| tech industry in general.
|
| In ethics, there is no right or wrong answers (mostly),
| just right and wrong methodologies. If you go the
| pragmatic way, you'd argue that the benefits of telemetry
| are greater than the downsides and implement it. If you
| go Kant's way, you would already have a maxim (either
| "never invade privacy" or "prioritize technical benefits
| regardless of the users" in this case) and act according
| to that maxim regardless of the situation. If you go the
| intent way, all that matters is whether your intent for
| the action is good or bad, in contrast if you go the
| outcome way, all that matters is the outcome regardless
| of the intent or the methods.
|
| These are all "valid" ways to discuss an ethical dilemma.
| However, one must always acknowledge the dilemma. This
| industry, especially big tech, seems to ignore this quite
| often, mostly because it's very easy to see people as
| "just numbers" when you don't see them directly. Don't
| even get me started on lawmakers who are also ignoring
| this whole issue. Many standard practices in this
| industry would be straight up illegal in lots of other
| areas, especially where there is face-to-face contact.
| msla wrote:
| > opt-in telemetry doesn't work.
|
| Then don't do telemetry at all.
|
| Google doesn't have a right to data.
| geodel wrote:
| Yeah, but they do have right to modify projects that they
| sponsor as they see fit.
| the_gipsy wrote:
| > this is an open source project, so you're free to maintain a
| fork without telemetry.
|
| That option is a joke. The real alternative is rust - or any
| non-corporate platform that isn't gonna pull these kind of
| stunts.
| JohnFen wrote:
| > Additionally, this ship has long sailed, everybody does opt-
| out.
|
| This is a meaningless point.
| cwkoss wrote:
| > opt-in telemetry doesn't work
|
| That's too bad. Guess you don't get any telemetry data if you
| want to develop ethical open software.
|
| The answer isn't to bend your ethics.
|
| If a take a dollar from everyone but it's opt out, that's just
| theft with extra steps.
|
| If I make it opt in, nobody is going to give me the dollar, but
| that doesn't make opt out morally justifiable.
| philosopher1234 wrote:
| Why is it unethical? People in every day life constantly
| assume things about one another and take actions that affect
| one another without asking first, and this is a practical
| necessity, so if the reason is that "you may not do anything
| without me tell you its ok" I don't think thats a defensible
| position.
| mh7 wrote:
| Any evidence that telemetry actually works? (i.e makes the
| program better)
| msla wrote:
| > opt-in telemetry doesn't work.
|
| Opt-out telemetry won't work when people send false data to the
| servers.
| gowld wrote:
| Why does the dev team to optimize a use-case that the user
| _doesn 't want_ optimized?
| Aaronontheweb wrote:
| How transparent is Scarf's product adoption metrics for OSS
| projects? https://about.scarf.sh/
|
| I follow them on Twitter but haven't looked much into it other
| than reading their documentation, which makes me think that most
| of their telemetry is done at the point of the package
| distribution system: https://about.scarf.sh/package-sdks
| bluehazed wrote:
| This is really slimy, Google swung and missed and let Go of the
| bat here.
| wrldos wrote:
| Imagine if GNU started adding telemetry to their compiler
| toolchain...
|
| If that sounds fucking stupid, which it does, then so does this.
| hommelix wrote:
| Telemetry in open source exists for a long time. Debian has the
| popcon package that can be installed and reports weekly usage of
| the software packages. The telemetry data are published in the
| open. The Debian popcon FAQ could be used as guideline for other
| telemetry needs. https://popcon.debian.org/
| msla wrote:
| > Debian has the popcon package that can be installed and
| reports weekly usage of the software packages.
|
| Right. That's fully opt-in, to the point the package isn't even
| installed by default, which is the only moral way to do this.
| ptx wrote:
| It does sound quite similar. But in addition to the crucial
| difference of opt-in vs. opt-out there's also an interesting
| contrast in how it's framed.
|
| Debian talks about what you, the user, can do: help out,
| participate and vote. If you choose to do so.
|
| The Go team talks about what the developers and their software
| will do to the user's machine, but the user is completely
| passive in their description. This is also reflected in the
| term "telemetry" itself: the software is not a tool in the
| user's hands but rather a remote-controlled probe in the user's
| habitat that pokes at the user to elicit interesting responses.
| agwa wrote:
| popcon should not be used as an example of how to do telemetry,
| as it is far worse for privacy than the Go proposal:
|
| 1. Sends names of private packages to the server, and publishes
| them.
|
| 2. Sends a unique identifier (a UUID stored in /etc/popularity-
| contest.conf) to the server, which is stored.
|
| 3. Doesn't use sampling, so if you use popcon you will be
| submitting a report once a week (Go's telemetry would average
| just one report a year).
|
| 4. Submits over plaintext protocols by default.
|
| popcon may be opt-in (in the sense that the prompt during
| installation has "No" selected by default) but the prompt
| doesn't disclose the large privacy risks.
|
| People are not appreciating the thought that has gone into the
| Go proposal to minimize the collection of private data, either
| intentionally or by accident, such as the client-enforced
| requirement that the the names of counters be published in a
| tamper-proof log so anyone can verify that, for example, no
| private package names are being disclosed. Everyone is focusing
| on opt-in vs out-out, but to me these other details are far
| more important.
| r2vcap wrote:
| I have set DOTNET_CLI_TELEMETRY_OPTOUT=1 as an environment
| variable in my .profile file. What should I do for golang?
| guessmyname wrote:
| > _To opt out, users would set GOTELEMETRY=off in their
| environment or run a simple command like go env -w
| GOTELEMETRY=off; The first telemetry report is not sent until
| at least one week after installation, giving ample time to opt
| out. Opting out stops all collection and reporting: no "opt
| out" event is sent. It is simply impossible to see systems that
| install Go and then opt out in the next seven days._
|
| Source: https://research.swtch.com/telemetry-intro
| [deleted]
| omginternets wrote:
| I see a very frustrating pattern emerging in which $COMPANY asks
| its users if it can do something, the users say "no", and
| $COMPANY storms off under the guise that "the discussion is
| unproductive".
|
| I am left with the impression that the decision has already been
| made, and that we are witnessing a PR strategy to make Google
| appear reasonable. I think that Mr. Cox, with all the respect I
| hold for him, is playing the part of the "useful idiot" here.
| taveras wrote:
| I wish there was a standard way of disabling telemetry across
| software dependencies.
|
| While I leave it turned on for personal projects, several
| projects at work require disabling it.
|
| I have spent hours auditing through transitive dependencies to
| turn it off. It should not be this painful.
| userbinator wrote:
| A firewall is probably your best bet. Don't allow network
| traffic originating from anything other than a short whitelist.
| slimsag wrote:
| No reason they couldn't all aim to respect a `TELEMETRY=false`
| env var, akin to the web's 'do not track' request.
| ComodoHacker wrote:
| Does anyone really respect DNT though?
| bee_rider wrote:
| I don't really see why a classic community-driven open source
| project would care about what non-contributing users are doing
| with the software. In that case, helpful users come with built-in
| telemetry (pull requests).
|
| But I guess this could be helpful corporatized read-only repo
| projects, or other groups that aren't sure if they are building a
| community or a customer base.
| deathanatos wrote:
| This week one of my tasks is to figure out how to neutralize some
| telemetry in one of our apps. We had no idea it was there, we do
| not want to be sending data. Last week, the parent company
| decided they didn't want to maintain the telemetry server any
| longer, and got rid of it.
|
| Now the tool has generated thousands of log messages that it
| can't phone home.
|
| And so it must be silenced, since it is cluttering up the logs,
| generating false alerts, etc.
|
| Please, no more.
| JohnFen wrote:
| The existence of telemetry is the main reasons why I avoid
| using new software anymore. Really, opt-in, opt-out, it doesn't
| matter. I can't trust that any of those mechanisms actually
| work, that if I opt out, an update won't reenable it, or that
| the data collected is actually limited and anonymized.
| 4ad wrote:
| This is just part 1, but all articles in the series have been
| published: https://research.swtch.com/telemetry
| kabdib wrote:
| Oh, hell no.
| tgv wrote:
| They could allow public access to that data. That can help more
| people than just the Go team, and it would add transparency.
| mseepgood wrote:
| That's literally the plan if you read it.
| candiddevmike wrote:
| Is this even up for debate, or is this post more of a FYI?
| wrldos wrote:
| No it's not up for debate at all. Much like when Microsoft did
| this with .Net core, the Github thread is clearly a misguided
| post by RSC expecting the community to conform or support it.
| They didn't so now it's a damage control exercise. It will
| happen.
|
| Any corporate controlled project on this scale is prone to this
| failure mode.
| slimsag wrote:
| I've been a pretty strong advocate of the idea that analytics
| should always be minimal, 100% anonymous, aggregated, and open to
| the public - otherwise it's spying. This is how we do analytics
| on our websites today[0][1], and how we plan to do it in games we
| release in the future. Maybe one day I will start a dedicated
| FOSS service that people can use for exactly this with some
| trusted reputation/transparency/auditability to it.
|
| I think what Russ has described here is decent and well-reasoned.
| I also think that Go being a product (it is, whether you like
| that word or not) makes it more fair to desire analytics of this
| form. I think it being opt-out is reasonable (after all, if it is
| not, they will make decisions using data that does not come from
| the vast majority of users, may as well not have analytics at all
| then.)
|
| But I am afraid of this becoming pervasive not just in products
| (like CLI tools), but also in libraries, imagine every Go/npm
| package you use wants to ping the network because the authors
| want to know 'is this popular? can we deprecate XYZ method?' etc.
| If transparent telemetry in the form Russ and I have been viewing
| it becomes a more common thing, it won't be a surprise if more
| library authors begin to try to adopt something like this and it
| becomes a pervasive problem IMHO.
|
| [0] https://hexops.com/privacy
|
| [1] https://machengine.org
| vineyardmike wrote:
| > the authors want to know 'is this popular? can we deprecate
| XYZ method?'
|
| This is something that was common for internal libraries at
| some of the places I've worked. I'm honestly a little surprised
| it isn't a thing we see externally. I for sure do not want to
| see it, but I'm surprised we don't. Its probably enough to look
| at the public usage on GitHub, and make inferences and post
| notice on future-major-versions of libraries. Github honestly
| should make a tool to do this, they'd have a huge opportunity
| to inspect the data.
| msla wrote:
| > I've been a pretty strong advocate of the idea that analytics
| should always be minimal, 100% anonymous, aggregated, and open
| to the public
|
| And opt-in.
|
| > I also think that Go being a product (it is, whether you like
| that word or not) makes it more fair to desire analytics of
| this form.
|
| Not by stealing it.
| rsc wrote:
| I am concerned about run-time telemetry in libraries as well.
| It might make sense for language ecosystems to offer more data
| about library usage gathered at build time eventually, as a
| different system than the one I'm posting about today. I think
| when you get to that level of detail you probably need to start
| thinking hard about differential privacy and probably
| cryptographic solutions like ESA or Prio. I don't think we know
| enough to design the library solution yet.
| JohnFen wrote:
| Telemetry embedded in libraries is simply abusive, in my
| opinion. At the very least, the decision about whether or not
| to include telemetry should be made by the application
| developers, not the toolmakers.
| rsc wrote:
| Right. My hope would be that language tooling offering
| library developers visibility into compile-time information
| about library usage would reduce their desire to insert
| run-time collection instead.
| infogulch wrote:
| This is a good plan, very simple and clear, and I like the list
| of system properties at the end. The solution is pretty tailored
| for the Go toolchain, which is a good strategy that has worked
| for them in the past.
|
| A more general purpose metrics tool I'm watching closely is Divvi
| Up https://divviup.org/, a research project by ISRG, the same org
| that runs LetsEncrypt. The basic idea is to divide up each metric
| into two parts and publish each part to separate collection
| servers (one run by you and the other by divviup). Then the
| servers separately aggregate their half and combine the results,
| the idea being that each half is useless on it's own but when
| combined it's still useful.
|
| I wouldn't suggest it for this application, but for the majority
| of typical apps it would be a vast improvement to privacy
| compared to the status quo.
| autoexec wrote:
| > Although the report would not include any identifiers, the TCP
| connection uploading the report would expose the system's public
| IP address to the server if a proxy is not being used. This IP
| address would not be associated with the uploaded reports in any
| way.
|
| Any fully transparent data collection is going to have to include
| IP addresses and timestamps. Even if the IP isn't being used for
| debugging, the software still phones home and the IP is still
| being collected and logged when it otherwise wouldn't be. Either
| when uploading the report or when downloading the "collection
| configuration".
|
| Honestly, assuming full transparency, I'm not opposed to the
| concept. I question how much telemetry is actually necessary, but
| I'm certain there will be times when it's nice to have. It'd also
| be interesting to see how it would go when for once people can
| see exactly what is collected, when, and from where.
|
| I'm not sure that Google is the best place to showcase such a
| concept though. I'm sure there are a lot of people who have no
| problem with handing more data over to Google, but Google has
| abused the public's good will for the sake of data collection
| many times, and it's sure to put off some of the people who
| aren't already completely disgusted by the idea of their favorite
| open source projects collecting telemetry.
| mananaysiempre wrote:
| > Any fully transparent data collection is going to have to
| include IP addresses and timestamps. Even if the IP isn't being
| used for debugging, the software still phones home and the IP
| is still being collected and logged when it otherwise wouldn't
| be. Either when uploading the report or when downloading the
| "collection configuration".
|
| How _do_ you verifiably not collect users' IP addresses when
| receiving data from them? The verifiable part is the problem,
| of course you can (and should) just not log the addresses, but
| then the users can only trust you (and hope you or your uplink
| haven't received any legal orders to the contrary). The only
| approach I can think of would be a Tor hidden service, but
| while it would technically work, as far as not exposing your
| users to scrutiny it actually sounds worse.
| rsc wrote:
| The only option is to have a proxy sit in the middle between
| the uploader and the server. You mentioned Tor but it doesn't
| have to be Tor, just some proxy most users would trust not to
| collude with the server and that doesn't itself derive
| benefit from seeing the IP addresses. If there were a
| different entity that could be relied upon to run servers
| doing this and were highly trusted by users, I'd be
| interested to use it. Failing that, the usual answer for an
| enterprise or company is to run their own HTTP proxy. The
| design explicitly supports that.
| geodel wrote:
| > their favorite open source projects collecting telemetry.
|
| Their favorite _Google_ open source project. This is specially
| important for project which can 't realistically exist without
| main sponsor / benefactor. It also help people to pay whatever
| little/high cost in term of conscience when they take part or
| consume something willingly but do not approve of makers.
| JohnFen wrote:
| > When you hear the word telemetry, if you're like me, you may
| have a visceral negative reaction to a mental image of intrusive,
| detailed traces of your every keystroke and mouse click headed
| back to the developers of the software you're using.
|
| But that's not my only objection to telemetry. Equally important
| to me is that so many bad decisions are justified based on
| telemetry. It's very easy to misunderstand the data, because
| telemetry leaves out so much, but developers often treat it as if
| it's giving a complete picture.
|
| As an example, I have seen developers drop really important
| functionality on the basis that it is rarely used. While that was
| true, it was also true that when those rare times happen, that
| functionality was absolutely critical to have.
| userbinator wrote:
| Or they use the data as an accelerant: move rarely used
| features to places where they're even less discoverable, making
| them even less used, and then remove them altogether. The
| justification then becomes a self-fulfilling prophecy.
| msla wrote:
| The only moral response is to send false data to the servers.
| alyandon wrote:
| Oh Google - never stop being you.
|
| Not only is it going to be opt-out (because of course it would be
| coming from Google), I really like the whole "wait a week before
| sending telemetry" part that just coincidentally has the benefit
| of sneaking right past people that actively look for suspicious
| network activity when they've freshly installed something.
|
| Am I being uncharitable?
| geodel wrote:
| [flagged]
| agwa wrote:
| Since you asked, yes you are being uncharitable. It's rather
| hard to imagine that the people who are details-oriented enough
| to look for suspicious network activity after installing
| something wouldn't notice the disclosure on the download page
| (edit: or the release notes). On the other hand, the
| explanation given by Russ for delaying a week (so people have
| ample time to opt-out) makes sense.
|
| Do you actually think Russ' explanation is just a pretext so
| they can evade detection by people who monitor for suspicious
| network activity (yet don't notice the disclosure on the
| download page)?
| alyandon wrote:
| I am jaded and probably being a little uncharitable. However,
| I don't know Russ personally so I have no reason to place a
| high level of confidence that a Google employee isn't going
| to make decisions that align more with Google's interests vs
| privacy interests.
|
| Regardless, there are plenty of ways to upgrade the Go tool
| chain (snaps, distro packages, fetching latest via curl, etc)
| that won't result in the changes being immediately visible.
| Given that, I think you are painting an overly optimistic
| picture of a world though where everyone that cares about
| this is going to be immediately aware that opt-out telemetry
| has been added vs a lot of installs being silently swept up
| into this by sheer ignorance.
|
| Also, this is going to require me to go and set environment
| variables in about a dozen environments to disable the
| collection and while I can pretty easily manage that task via
| ansible I'm not happy about having to jump through hoops to
| turn off telemetry for a freaking compiler tool chain.
| agwa wrote:
| > I am jaded and probably being a little uncharitable.
| However, I don't know Russ personally so I have no reason
| to place a high level of confidence that a Google employee
| isn't going to make decisions that align more with Google's
| interests vs privacy interests.
|
| If the nature of this data were different, I would be
| suspicious too. But it's really hard for me to see how a
| set of counters (whose names have various protections to
| ensure they can't contain private information) being sent
| approximately once a year is going to help with Google's
| advertising interests (which is what I assume you meant by
| "Google's interests"; I think they also have an interest in
| making Go better and the telemetry proposal aligns with
| that). This is literally the first time I've been OK with
| telemetry.
|
| > Regardless, there are plenty of ways to upgrade the Go
| tool chain (snaps, distro packages, fetching latest via
| curl, etc) that won't result in the changes being
| immediately visible. Given that, I think you are painting
| an overly optimistic picture of a world though where
| everyone that cares about this is going to be immediately
| aware that opt-out telemetry has been added vs a lot of
| installs being silently swept up into this by sheer
| ignorance.
|
| I agree there will be people who won't notice the
| disclosure (which will also be in the release notes), but
| again I tend to think that the people sniffing network
| traffic after installing a program would also scrutinize
| release notes instead of just blindly installing upgrades,
| which is why I find it pretty improbable that Russ'
| explanation was a pretext.
|
| > Also, this is going to require me to go and set
| environment variables in about a dozen environments to
| disable the collection and while I can pretty easily manage
| that task via ansible I'm not happy about having to jump
| through hoops to turn off telemetry for a freaking compiler
| tool chain.
|
| I think the best suggestion I've seen is that there should
| be a single environment variable (e.g. $TELEMETRY) that all
| programs should respect, to avoid the need to do work for
| every application.
| JohnFen wrote:
| > I think the best suggestion I've seen is that there
| should be a single environment variable (e.g. $TELEMETRY)
| that all programs should respect, to avoid the need to do
| work for every application.
|
| This is a nonstarter, as DNT demonstrated in spades.
| arp242 wrote:
| > there should be a single environment variable (e.g.
| $TELEMETRY) that all programs should respect, to avoid
| the need to do work for every application.
|
| There was a proposal for that some years ago, but that
| didn't really go anywhere, partially because of the
| author's rather unpleasant attitude towards projects he
| wanted to implement it and their overly broad definition
| of "tracking" (which includes e.g. update checks).
|
| Some discussions:
|
| https://news.ycombinator.com/item?id=27746587
|
| https://lobste.rs/s/htbkqd/console_do_not_track
| Thaxll wrote:
| Very popular programming language and IDE have telemetry on by
| default, VSCode, C#, Java etc ...
|
| People act like they discover telemetry in 2023.
|
| I don't think it's a big deal, ultimately it's to improve Go
| and the proposal makes it very easy to disable it ( single env
| variable ).
| xh-dude wrote:
| Totally agree. Telemetry has been around and matured and
| benefits users. I'm not sure the benefits for Go would be as
| significant as other software but, really, why not?
| masklinn wrote:
| > Telemetry has been around and matured and benefits users.
|
| Does it? Telemetry mostly seems used to justify removing
| features I need on grounds that they're little used.
|
| As an other user noted, if telemetry is your yardstick, the
| average backup software removed the "restore" feature
| because that's barely ever used.
| xh-dude wrote:
| I don't like this example in particular because observing
| too much "restore" activity is an excellent piece of
| information.
| arp242 wrote:
| There is a long list of use cases, which go far beyond
| "removing features":
| https://research.swtch.com/telemetry-uses
|
| "Is it safe to remove support for X?" is one use case.
| Right now the strategy more or less amounts to "remove
| and see if anyone complains, possibly too late to
| change".
| jiggawatts wrote:
| Microsoft deprecated the disk-image backup in Windows 7
| because it was infrequently used... buy random
| grandparents.
|
| It was basically a "free" wrapper on top of the Volume
| Shadow Service (VSS) built into the operating system, but
| only IT professionals ever used it, so... it had to go.
| alyandon wrote:
| I think they should all be opt-in as well. However, as a
| developer and pretend sysadmin, I am generally a nice guy
| about not turning off telemetry on software products with a
| user facing UI that I use frequently.
| _ph_ wrote:
| If there is any virtue to collecting telemetry, make it opt-in.
| Any developer convinced of this being useful will gladly enable
| it. But making it opt-out is just nefarious, because most users
| will not be aware of it.
| Thaxll wrote:
| This is naive, no one ever turn telemetry on if it's turned off
| by default, that's the reason why it's on by default.
| msla wrote:
| "This is naive, no one would ever allow me into their homes
| if I asked first, and how else would I find out what diseases
| they have?"
| ocdtrekkie wrote:
| > no one ever turn telemetry on if it's turned off by default
|
| If nobody would voluntarily do it, why do you think it's okay
| to do it at all? By your very admission, _nobody wants this_.
| Because if they did, they 'd turn it on!
| _ph_ wrote:
| Still, opt-out is just inacceptable. At least with a
| mechanism which can easily fail, like setting an environment
| variable. This basically forces you to wrap the go tool in a
| script which ensures the environment variable to be set.
|
| As this seem to cache the results, another option is to
| fiddle with the cache to report bogus information.
| Thaxll wrote:
| You can set env variable for the go toolchain with a
| command such as: go env -w TELEMETRY=off which will be
| written to disk and use by the go cli.
| _ph_ wrote:
| Where will it be written to and how is it guaranteed to
| be picked up by any further invocations of the tools?
| mftb wrote:
| I hope this proposal is defeated and they don't implement this. I
| don't buy the premise that the benefit is worth the price. I
| think CLI tools like the ones in the Go Toolchain and their usage
| patterns are fairly well understood by this point. I'm sick and
| tired of every piece of software I interact with phoning home.
|
| That said, as long as they give me reasonable means to configure
| the software the way I want, it's probably not a deal-breaker for
| me. In other words, I will just set the $ENV_VAR_WHATEVER to turn
| this off, and that's that.
| teraflop wrote:
| I am all for transparency and limited intrusiveness of telemetry.
|
| But in practical terms, the problem with this approach -- if I'm
| understanding it correctly -- is that it has no way to detect and
| reject outliers, and therefore the data can't be validated in any
| way. It only makes sense if all your clients are 100%
| trustworthy.
|
| Let's say you want to know whether to keep supporting ARMv5, and
| your data says 10% of users are using it. There's no way to tell
| whether that's accurate, or if you have 0.01% of die-hard users
| who modified their telemetry code to report 1000x as frequently
| as they're supposed to. Even if you suspect this is happening
| (and you might not), there's no way to identify the culprit and
| filter out their data without tracking personal identifiers such
| as IP addresses.
|
| So even if _most of the time_ the telemetry data is valid, over
| time it will trend toward uselessness, because it can be
| endlessly second-guessed unless it confirms a decision you wanted
| to make anyway.
| ergonaught wrote:
| On-by-default makes me question whether rsc's judgement has been
| compromised, which leads me to question continuing to use the
| language. A strange miss for him.
| nicce wrote:
| Off-by-default in a scale likely means that there is no
| telemetry at all. I would not cancel a guy or programming
| language based on just suggesting that. He has given a lot of
| though for that if you read the blog posts.
| cwkoss wrote:
| If a take a dollar from everyone but it's opt out, that's
| still theft.
|
| If I make it opt in, nobody is going to give me the dollar,
| but that doesn't make opt out morally justifiable.
| nicce wrote:
| You are comparing apples to oranges. Telemetry is a curse
| word these days, but you should still read his posts.
| whoopsie wrote:
| Opaque telemetry can also be a barrier to adoption: my users' IP
| addresses may legally be PII that I cannot disclose.
| stefanos82 wrote:
| From a legal point of view, how companies will react to this, be
| it default-on or default-off?
|
| Some companies are using it for internal use which I'm sure all
| of us know cases with a number NDA-ed projects from third-parties
| or outsourced companies that collaborate on the matter.
|
| So, who is going to sue whom here when the one party will disable
| OR has already disabled the telemetry and the other will have it
| on by default, for whatever reason?
___________________________________________________________________
(page generated 2023-02-08 23:00 UTC)