[HN Gopher] We collected 10k hours of neuro-language data in our...
       ___________________________________________________________________
        
       We collected 10k hours of neuro-language data in our basement
        
       Author : nee1r
       Score  : 70 points
       Date   : 2025-12-08 17:33 UTC (5 hours ago)
        
 (HTM) web link (condu.it)
 (TXT) w3m dump (condu.it)
        
       | ArjunPanicksser wrote:
       | Makes sense that CL ends up being the best for recruiting first-
       | time participants. Curious what other things you tried for
       | recruitment and how useful they were?
        
         | n7ck wrote:
         | The second most useful by far is Indeed, where we post an
         | internship opportunity for participants interested in doing 10
         | sessions over 10 weeks. Other things that work pretty well are
         | asking professors to send out emails to students at local
         | universities, putting up ~300-500 fliers (mostly around
         | universities and public transit), and posting on Nextdoor. We
         | also just texted a lot of groupchats/posted on linkedin/ gave
         | out fliers and the signup link to kind of everyone we talked to
         | in cafes and similar. We take on some participants as
         | ambassadors as well, and pay them to refer their friends.
         | 
         | We tried google/facebook/instagram ads, and we tried paying for
         | some video placements. Basically none of the explicit
         | advertisement worked at all and it wasn't worth the money.
         | Though for what it's worth, none of us are experts in
         | advertising, so we might have been going about it wrong -- we
         | didn't put loads of effort into iterating once we realized it
         | wasn't working.
        
       | mishajw wrote:
       | Interesting dataset! I'm curious what kind of results you would
       | get with just EEG, compared to multiple modalities? Why do
       | multiple modalities end up being important?
        
         | n7ck wrote:
         | EEG has very good temporal resolution, but quite bad spacial
         | resolution, and other modalities have different tradeoffs
        
       | g413n wrote:
       | what's the basis for conversion between hours of neural data to
       | number of tokens? is that counting the paired text tokens?
        
         | rio-popper wrote:
         | edit: oops sorry misread - the neural data is tokenised by our
         | embedding model. the number of tokens per second of neural data
         | varies and depends on the information content.
        
       | n7ck wrote:
       | Hey I'm Nick, and I originally came to Conduit as a data
       | participant! After my session, I started asking questions about
       | the setup to the people working there, and apparently I asked
       | good questions, so they hired me.
       | 
       | Since I joined, we've gone from <1k hours to >10k hours, and I've
       | been really excited by how much our whole setup has changed. I've
       | been implementing lots of improvements to the whole data pipeline
       | and the operations side. Now that we train lots of models on the
       | data, the model results also inform how we collect data (e.g. we
       | care a lot less about noise now that we have more data).
       | 
       | We're definitely still improving the whole system, but at this
       | point, we've learned a lot that I wish someone had told us when
       | we started, so we thought we'd share it in case any of you are
       | doing human data collection. We're all also very curious to get
       | any feedback from the community!
        
         | internet_points wrote:
         | I thought that kind of career change only happened in The Sims
         | :-)
        
           | n7ck wrote:
           | hahahah tell me about it!
        
       | Gormisdomai wrote:
       | The example sentences generated "only from neural data" at the
       | top of this article seem surprisingly accurate to me, like, not
       | exact matches but much better than what I would expect even from
       | 10k hours:
       | 
       | "the room seemed colder" -> " there was a breeze even a gentle
       | gust"
        
         | ninapanickssery wrote:
         | Yeah, agreed
        
         | jcims wrote:
         | Exactly. And honestly both this example and the one about the
         | woman seemed to be what I would actually think/feel vs what I
         | say.
         | 
         | Very interesting!
        
         | CobrastanJorji wrote:
         | Tangential to your point, if you collect 10,000 hours of brain
         | scanning in exactly one damp basement, I wonder if perhaps the
         | model would become very, very specialized for all of the
         | flavors of "this room seems colder."
        
           | rio-popper wrote:
           | For the record, it was two basements -- we moved office in
           | the middle -- and a bigger issue was actually overheating.
           | But your point is basically right! The model is a lot better
           | at certain kinds of ideas than others. Particularly
           | concerning was the fact that the first cluster I noticed
           | getting good was all the different variations of 'the headset
           | is uncomfortable/heavy' etc. But this makes sense -- what
           | participants talk about has a lot to do with what kinds of
           | ideas the model can pick up, and this was more or less what
           | we expected
        
       | ag8 wrote:
       | This is a cool setup, but naively it feels like it would require
       | hundreds of thousands of hours of data to train a decent
       | generalizable model that would be useful for consumers. Are there
       | plans to scale this up, or is there reason to believe that tens
       | of thousands of hours are enough?
        
         | n7ck wrote:
         | Yeah I think the way we trained the embedding model focused a
         | lot on how to make it as efficient as possible, since it is
         | such a data-limited regime. So I think based on (early) scaling
         | results, it'll be closer to 50-70k hours, which we should be
         | able to get in the next months now we've already scaled up a
         | lot.
         | 
         | That said, the way to 10-20x data collection would be to open a
         | couple other data collection centers outside SF, in high-
         | population cities. Right now, there's a big advantage in just
         | having the data collection totally in-house, because it's so
         | much easier to debug/improve it because we're so small. But now
         | we've mostly worked out the process, it should also be very
         | straightforward for us to just replicate the entire ops/data
         | pipeline in 3-4 parallel data collection centers.
        
       | richardfeynman wrote:
       | This is an interesting dataset to collect, and I wonder whether
       | there will be applications for it beyond what you're currently
       | thinking.
       | 
       | A couple of questions: What's the relationship between the number
       | of hours of neurodata you collect and the quality of your
       | predictions? Does it help to get less data from more people, or
       | more data from fewer people?
        
         | n7ck wrote:
         | 1. The predictions get better with more data - and we don't
         | seem to be anywhere near diminishing returns. 2. The thing we
         | care about is generalization between people. For this, less
         | data from more people is much better.
        
           | richardfeynman wrote:
           | I noticed you tracked sessions per person, implying a subset
           | of people have many hours of data collected on them. Are
           | predictions for this subset better than the median?
           | 
           | For a given amount of data, is it better to have more people
           | with less data per person or fewer people with more data per
           | person?
        
             | clemvonstengel wrote:
             | Yes, the predictions are much better for people with more
             | hours of data in the training set. Usually, we just totally
             | separate the train and val set, so no individual with any
             | sessions in the train set is ever used for evals. When we
             | instead evaluate on someone with 10+ hours in the train
             | set, predictions get ~20-25% better.
             | 
             | For a given amount of data, whether you want more or less
             | data per person really depends on what you're trying to do.
             | The thing we want is for it to be good at zero-shot, that
             | is, for it to decode well on people who have zero hours in
             | the train set. So for that, we want less data per person.
             | If instead we wanted to make it do as well as possible on
             | one individual, then we'd want way more data from that one
             | person. (So, e.g., when we make it into a product at first,
             | we'll probably finetune on each user for a while)
        
               | richardfeynman wrote:
               | Makes a ton of sense, thanks.
               | 
               | I wonder if there will be medical applications for this
               | tech, for example identifying people with brain or
               | neurological disorders based on how different their
               | "neural imaging" looks from normal.
        
       | wiwillia wrote:
       | Really interested in how accuracy improves with the scale of the
       | data set. Non-invasive thought-to-action would be a whole new
       | interaction paradigm.
        
       | devanshp wrote:
       | Cool post! I'm somewhat curious whether the data quality scoring
       | has actually translated into better data; do you have numbers on
       | how much more of your data is useful for training vs in May?
        
         | rio-popper wrote:
         | so the neural quality real-time checking was the most important
         | thing here. Before we rewrote the backend, between 58-64% of
         | participant hours were actually usable data. Now, it's between
         | 90-95%
         | 
         | If you mean the text quality scoring system, then when we added
         | that, it improved the amount of text we got per hour of neural
         | data by between 30-35%. (That includes the fact that we filter
         | which participants we have return based on their text quality
         | scores)
        
       | rajlego wrote:
       | Did you consider trying to collect data in a much poorer country
       | that still has high quality English? e.g. the Philippines
        
         | rio-popper wrote:
         | Yeah we did consider this. For now, there's an advantage to
         | having the data collection in the same building as the whole
         | eng team, but once we hire a couple more engs, I expect we'll
         | just replicate the collection setup in other countries as well
        
       | estitesc wrote:
       | Loved watching this unfold in our basement. : )
        
       | dang wrote:
       | [under-the-rug stub]
       | 
       | [see https://news.ycombinator.com/item?id=45988611 for
       | explanation]
        
         | ClaireBookworm wrote:
         | Yoo this is sick!! sometimes it might actually just be a data
         | game, so huge props to them for actually collecting all that
         | high-quality data
        
         | ninapanickssery wrote:
         | This is very cool, thanks for writing about your setup in such
         | detail! It's impressive that you can predict stuff from this
         | noninvasive data. Are there similar existing datasets or this
         | the first of its kind?
        
         | cpeterson42 wrote:
         | Wild world we live in
        
       | titzer wrote:
       | I lol'd at the hardware "patch" that kept the software from
       | crashing--removing all but the alpha-numeric keys (!?). Holy cow,
       | you had time to collect thousands of hours of neurotraces but
       | couldn't sanitize the inputs to remove a stray [? That
       | sounds...funky.
        
         | NoraCodes wrote:
         | Presumably it's more like an errant Ctrl-C.
        
           | clemvonstengel wrote:
           | Yup exactly this. Also Ctrl-W, alt tab, etc.
        
       | in-silico wrote:
       | It's interesting that the model generalizes to unseen
       | participants. I was under the impression that everyone's brain
       | patterns were different enough that the model would need to be
       | retrained for new users.
       | 
       | Though, I suppose if the model had LLM-like context where it kept
       | track of brain data and speech/typing from earlier in the
       | conversation then it could perform in-context learning to adapt
       | to the user.
        
         | clemvonstengel wrote:
         | Basically correct intuition: the model does much better when we
         | give it, e.g., 30 secs of neural data in the leadup instead of
         | e.g. 5 secs. My sense is also that it's learning in context, so
         | people's neural patterns are quite different but there's a
         | higher-level generator that lets the model learn in context (or
         | probably multiple higher-level patterns, each of which the
         | model can learn from in context).
         | 
         | We only got any generalization to new users after we had >500
         | individuals in the dataset, fwiw. There's some interesting MRI
         | studies also finding a similar thing that when you have enough
         | individuals in the dataset, you start seeing generalization.
        
       | asgraham wrote:
       | Really cool dataset! Love seeing people actually doing the hard
       | work of generating data rather than just trying to analyze what
       | exists (I say this as someone who's gone out of his way to avoid
       | data collection).
       | 
       | Have you played at all with thought-to-voice? Intuitively I'd
       | think EEG readout would be more reliable for spoken rather than
       | typed words, especially if you're not controlling for keyboard
       | fluency.
        
         | clemvonstengel wrote:
         | Yeah we do both text and voice (roughly 70% of data collection
         | is typed, 30% spoken). Partly this is to make sure the model is
         | learning to decode semantic intent (rather than just planned
         | motor movements). Right now, it's doing better on the typed
         | part, but I expect that's just because we have more data of
         | that kind.
         | 
         | It does generalize between typed and spoken, i.e. it does much
         | better on spoken decoding if we've also trained on the typing
         | data, which is what we were hoping to see.
        
           | asgraham wrote:
           | Interesting! I imagine speech-related motor artifacts don't
           | help matters either, even if noise starts mattering less at
           | scale.
        
             | n7ck wrote:
             | Yeah -- we have the participants use chinrests as well,
             | which reduces head motion artifacts for typing but less so
             | for speaking (because they have to move their heads for
             | that of course). so a lot of the data is with them keeping
             | their heads quite still, although the model is becoming
             | much more robust to this over time.
        
       | whatshisface wrote:
       | What's the plan for after this mind reading helmet works
       | reliably?
        
         | brovonov wrote:
         | Sell it to an ad agency.
        
         | clemvonstengel wrote:
         | We build headsets that lets you control your computer directly
         | with your mind. Initially I expect we can get increased
         | bandwidth / efficiency on common tasks (including coding) - but
         | I think it gets really exciting when people start designing new
         | software / interaction paradigms with this in mind.
        
           | whatshisface wrote:
           | If you want it to be remembered as a revolutionary computer
           | interface, you will have to make sure it is not used in
           | interrogations.
        
       | xg15 wrote:
       | It's an enormously cool project (and also feels like the next
       | logical thing to do after all the existing modalities)
       | 
       | But it feels eery to read a detailed story how they built and
       | improved their setup and what obstacles they encountered,
       | complete with photos - without any mention _who_ is doing the
       | things we are reading about. There is no mention of the staff or
       | even the founders on the whole website.
       | 
       | I had a hard time judging how large this project even is. The
       | homebuilt booths and trial-and-error workflow sound like "three
       | people garage startup", but the bookings schedule suggests a
       | larger team.
       | 
       | (At least there is an author line on that blog post. Had to
       | google the names to get some background on this company)
       | 
       | You should consider an "about us" page :)
        
         | rio-popper wrote:
         | Good point. We're a team of 7 right now (3 engineering, 4
         | running data collection across shifts). We've been spending
         | ~all our time on the data and model side, so the "About us"
         | page lagged behind, but we'll add one this week. Appreciate the
         | feedback!
        
           | xg15 wrote:
           | No question these are the more important things to spend time
           | on. Good luck!
        
       | accrual wrote:
       | Very cool project! I had a couple ideas during the read:
       | 
       | * A ceiling-based pully system could help take the physical load
       | off the users and may allow for increased sensor density. Some
       | large/public VR setups do this.
       | 
       | * I'm sure you considered it, but a double-converting UPS might
       | reduce the noise floor of your sensors and could potentially
       | support multiple booths. Expensive though, and it's already
       | mentioned that data quantity > quality at this stage. Maybe a
       | future fine-tuning step could leverage this.
       | 
       | Cool write up and hope to see more in the future!
        
       | moffkalast wrote:
       | Your engineers were so preoccupied with whether or not they
       | could, they didn't stop to think if they should.
        
       ___________________________________________________________________
       (page generated 2025-12-08 23:00 UTC)