Posts by bob_zim@infosec.exchange
 (DIR) Post #AcgK3RC7SlEWpTvpMu by bob_zim@infosec.exchange
       2023-12-10T22:42:28Z
       
       1 likes, 0 repeats
       
       @mcv I had to fight that fight when Spectre/Meltdown were the shiny new flaws. “We need you to prove the firewalls and routers aren’t vulnerable to Spectre/Meltdown!”That whole class of flaw requires the ability to run code on the target system. If somebody who isn’t on my team can run *any* code on our firewalls and routers, we have much bigger problems.
       
 (DIR) Post #Ad3GqFs8UNBbpuXrge by bob_zim@infosec.exchange
       2023-12-22T01:07:16Z
       
       0 likes, 0 repeats
       
       @Wolven They should really get some professional help.https://mastodon.archive.org/@textfiles/111614271736680242
       
 (DIR) Post #AlKOyq2QzDp5NhI6eu by bob_zim@infosec.exchange
       2024-08-25T14:25:34Z
       
       0 likes, 0 repeats
       
       @futurebird Roko’s basilisk is even dumber than Pascal’s wager because the subject of the supposed eternal torture is a *copy* of you. No causal connection. If we assume an AI has this capability, why do they think its exercise of this capability would depend on the actions of the person who it illicitly copied?
       
 (DIR) Post #AlXJ77YDgX5fLICghE by bob_zim@infosec.exchange
       2024-08-31T18:01:55Z
       
       1 likes, 2 repeats
       
       @jerry @SunTzuCyber
       
 (DIR) Post #AoXq2kKjFl6FIUURH6 by bob_zim@infosec.exchange
       2024-11-29T19:36:24Z
       
       0 likes, 0 repeats
       
       @tomjennings @lattera Digging into this a bit more because your mention of motion detection twigged a memory. There’s a video recording platform called Frigate (https://frigate.video) which runs locally and which can use a Coral TPU (https://coral.ai) for motion detection and object classification. I don’t personally use it at this time, but it looks entirely open source with some ML models behind a $50/yr paywall.They have a few cameras listed in the Recommended Hardware section of their documentation:Loryta (Dahua) IPC-T549M-ALED-S3Loryta (Dahua) IPC-T54IR-ASAmcrest IP5M-T1179EW-AI-V3
       
 (DIR) Post #Aq5s7Z0UCk3jxYCHyK by bob_zim@infosec.exchange
       2025-01-15T02:44:50Z
       
       1 likes, 1 repeats
       
       @catsalad Kitties of *ALL* sizes.
       
 (DIR) Post #Astfpd3aLHNitFoAaW by bob_zim@infosec.exchange
       2025-04-08T23:16:28Z
       
       0 likes, 0 repeats
       
       @mcc @nickzoic Is it not just back bacon?
       
 (DIR) Post #AuEXNGy9ApluAELpBI by bob_zim@infosec.exchange
       2025-05-18T23:20:18Z
       
       0 likes, 0 repeats
       
       @wolf480pl @mildsunrise Time_wait in particular is meant to ensure there’s not some packet in flight waiting to mess things up if the IPs and ports are reused.Traffic can take many paths with varying delay. Imagine a client connects to your server and most traffic gets sent over a very low-latency link, but they send a request right at the end over a satellite link (multiple seconds of latency). Before getting a response, the user closes the application, and you do the full FIN dance.Then some other application on the same client connects to you using the same source port using the fast link. Full handshake, and you start talking, then that long-delayed request finally arrives. It has a sequence number *way* off what you’re expecting, so it looks like you’ve missed a billion bytes.Yeah, it’s an extremely rare situation at best.
       
 (DIR) Post #AuEbH2cpzRWPkzfT6G by bob_zim@infosec.exchange
       2025-05-19T00:04:01Z
       
       0 likes, 0 repeats
       
       @wolf480pl @mildsunrise The TCP state machine dates back to the 70s, before most of those things were added, and when data connections were both slow and incredibly expensive. 😜 Packet switching existed, but circuit switching and dial-on-demand were extremely common.Now nobody wants to change it because it’s *everywhere*. A lot of systems provide a way to tune how long time_wait lasts before the connection is removed from RAM.
       
 (DIR) Post #AuOswEHbC5z2aPYhM0 by bob_zim@infosec.exchange
       2025-05-23T21:42:30Z
       
       0 likes, 1 repeats
       
       @kims @homohortus A lot of rural areas in the US have “speed enforced by aircraft” signs, which are basically the same thing. In this case, a helicopter or drone uses the speed radar.The US signs read like something out of Judge Dredd, though.
       
 (DIR) Post #Av4L4ZqnmKhrtkigzo by bob_zim@infosec.exchange
       2024-02-02T16:29:56Z
       
       2 likes, 5 repeats
       
       @shortridge While working tech support, I got a call on a Monday. Some VPNs which had been working on Friday were no longer working. After a little digging, we found the negotiation was failing due to a certificate validation failure.The certificate validation was failing because the system couldn’t check the certificate revocation list (CRL).The system couldn’t check the CRL because it was too big. The software doing the validation only allocated 512kB to store the CRL, and it was bigger than that. This is from a private certificate authority, though, and 512kB is a *LOT* of revoked certificates. Shouldn’t be possible for this environment to hit within a human lifespan.Turns out the CRL was nearly a megabyte! What gives? We check the certificate authority, and it’s revoking and reissuing every single certificate it has signed once per second.The revocations say all the certificates (including the certificate authority’s) are expired. We check the expiration date of the certificate authority, and it’s set to some time in 1910. What? It was around here I started to suspect what had happened.The certificate authority isn’t valid before some time in 2037. It was waking up every second, seeing the current date was after the expiration date and reissuing everything. But time is linear, so it doesn’t make sense to reissue an expired certificate with an earlier not-valid-before date, so it reissued all the certs with the same dates and went to sleep. One second later, it woke up and did the whole process over again. But why the clearly invalid dates on the CA?The CA operation log was packed with revocations and reissues, but I eventually found the reissues which changed the validity dates of the CA’s certificate. Sure enough, it reissued itself in 2037 and the expiration date was set to 2037 plus ten years, which fell victim to the 2038 limitation. But it’s not 2037, so why did the system think it was?The OS running the CA was set to sync with NTP every 120 seconds, and it used a really bad NTP client which blindly set the time to whatever the NTP server gave it. No sanity checking, no drifting. Just get the time, set the time. OS logs showed most of the time, the clock adjustment was a fraction of a second. Then some time on Saturday, there was an adjustment of tens of thousands of seconds forward. The next adjustment was hundreds of thousands of seconds forward. Tens of millions of seconds forward. Eventually it hit billions of seconds backwards, taking the system clock back to 1904 or so. The NTP server was racing forward through the 32-bit timestamp space.At some point, the NTP server handed out a date in 2037 which was after the CA’s expiration. It reissued itself as I described above, and a date math bug resulted in a cert which expired before it was valid. So now we have an explanation for the CRL being so huge. On to the NTP server!Turns out they had an NTP “appliance” with a radio clock (i.e, a CDMA radio, GPS receiver, etc.). Whoever built it had done so in a really questionable way. It seems it had a faulty internal clock which was very fast. If it lost upstream time for a while, then reacquired it after the internal clock had accumulated a whole extra second, the server didn’t let itself step backwards or extend the duration of a second. The math it used to correct its internal clock somehow resulted in dramatically shortening the duration of a second until it wrapped in 2038 and eventually ended up at the correct time.Ultimately found three issues:• An OS with an overly-simplistic NTP client• A certificate authority with a bad date math system• An NTP server with design issues and bad hardwareEdit: The popularity of this story has me thinking about it some more.The 2038 problem happens because when the first bit of a 32-bit value is 1 and you use it as a signed integer, it’s interpreted as a negative number in 2’s complement representation. But C has no protection from treating the same value as signed in some contexts and unsigned in others. If you start with a signed 32-bit integer with the value -1, it is represented in memory as 0xFFFFFFFF. If you then use it as an unsigned integer, it becomes the value 4,294,967,296.I bet the NTP box subtracted the internal clock’s seconds from the radio clock’s seconds as signed integers (getting -1 seconds), then treated it as an unsigned integer when figuring out how to adjust the tick rate. It suddenly thought the clock was four billion seconds behind, so it really has to sprint forward to catch up!In my experience, the most baffling behavior is almost always caused by very small mistakes. This small mistake would explain the behavior.
       
 (DIR) Post #Av4L4fbmN6FrktjmPg by bob_zim@infosec.exchange
       2024-02-03T15:01:13Z
       
       2 likes, 2 repeats
       
       @shortridge Some time later, I was no longer working tech support. I got hired to do network and firewall stuff for a fairly large company. At one point, they decided to relocate the office where a lot of the operations and monitoring staff worked. They moved the whole application monitoring team to the new building with the unproven infrastructure first, because some people in charge made very bad decisions.The monitoring team gets to the new building, and they can’t access any of their monitoring systems. Clearly a problem with the new office, right? They go through a few environments to get to their monitoring systems, so I log in to the remote access VPN for the first one and confirm the first firewall they hit sees their traffic and isn’t dropping it.I go to log in to the remote access VPN for the second environment, where the monitoring systems actually live. I’m able to start the connection, but it never prompts me for my credentials, and the tunnel never comes up. Huh. That’s weird.Well, I’ll just get in through the DR version of the second environment. Connection works and it prompts me for my credentials, but it rejects them. I try again, in case I made a mistake entering the passphrase for my key, but it’s still rejected. Huh. That’s weird.I eventually find a working way in. I’m able to ping all the relevant systems, I’m able to make TCP connections via telnet, but trying to actually use a service like SSH or MSRDP just hangs. But wait! I can connect to my firewalls via SSH! So what’s common among the broken systems?All the broken systems are VMs. I start testing connections to other things which I know are VMs. They all behave the same. Ping works, TCP connections work, but data over the connections gets no response.I bring in the virtualization team. Some of us drive in to the datacenter hosting the VMs giving us trouble. Someone quickly realizes the single SAN hosting all of the VMs’ drives was up, but wasn’t responding to storage requests. Effectively the drive had been pulled out of every single VM. Now we have an explanation for why all the VMs seem to be broken.With most operating systems, the network stack is wired in RAM and can’t be swapped out. The network stack handles responding to pings and opening TCP connections on listening ports. Once a TCP connection is opened, it requests a copy of the listening service from storage to handle the connection. With storage no longer responding, the network stack never gets the copy of the service to handle the connection, so data doesn’t work.Why couldn’t I connect to the second VPN endpoint? Well, some people in charge made very bad decisions. They had decided that since VMs are the future, the VPN endpoints in that facility should be moved from dedicated hardware to VMs stored on the SAN. They hadn’t gotten to the first VPN endpoint yet, but that environment wasn’t allowed to connect in to the second environment.Okay, but I could connect to the other site’s VPN endpoint, and the other site didn’t have any problems. Why didn’t it accept my credentials? Well, some people in charge made very bad decisions (you may be noticing a theme!). All authentication was run through some VMs which were stored on the SAN. The VPN boxes in the working location were set to monitor the health of the authentication boxes in the failed location by pinging them. As long as they responded to ping, they were good, so the VPN boxes wouldn’t fail over to using their local authentication boxes. And a computer with its drive pulled can still respond to ping with just the network stack in RAM.Once we realized what was going on, we physically connected to the WAN routers and added routes to prevent the two sites from reaching each other’s authentication boxes. Presto! We could now log in via the DR environment as normal. The other infrastructure teams were then able to start digging into their parts.But why is the SAN unresponsive? Turns out this particular SAN vendor had an option for what to do under certain failure conditions: it could fail read-only or fail completely silent. This one was set to fail silent, and it had filled up.I wasn’t directly involved in fixing the SAN. I know the manager over the SAN team had been sounding the alarm for months before it filled. I also know there were multiple levels of bad configuration, such as more space offered by LUNs than the SAN could physically provide.Big takeaways:1. Make sure your access to fix a system doesn’t depend on that system. It’s really easy to accidentally introduce dependency cycles, and it takes constant work to avoid them.2. Superficial tests like whether you can ping something can’t detect some pretty major failures. More significant tests are more likely to notice the problem.3. When something is critical to an environment, maybe have more than one of them? The SAN had internal redundancy to deal with faulty drives and so on, but all the storage was in one giant pool. Multiple SAN systems can provide a bulkhead such that breaking one would not break all VMs.
       
 (DIR) Post #AvzqlFIyL5cqM9cGo4 by bob_zim@infosec.exchange
       2025-07-10T17:01:01Z
       
       0 likes, 0 repeats
       
       @futurebird A surprising number of address verification systems don’t support a field long enough for my street address. They change it from “9876 Town Sq. Dr. Apt. 1234” (my actual street name is longer, but similar) to “9876 Town Square Drive, Apartment 123”, and the 4 no longer fits in the field. Can’t type it in manually, and the verification system won’t let me abbreviate any of the words to make room.
       
 (DIR) Post #AyjqV38iFZHiRoggcq by bob_zim@infosec.exchange
       2025-09-30T17:18:57Z
       
       0 likes, 0 repeats
       
       @futurebird For some people, absolutely.For other people, it’s more an excuse to just keep messing things up, much like “we’ll solve climate change with carbon capture!”
       
 (DIR) Post #AzhaPzjbdc93mtGGhc by bob_zim@infosec.exchange
       2025-10-29T13:01:29Z
       
       0 likes, 0 repeats
       
       @futurebird I like the calculus, and it doesn’t really work without positive and negative numbers which are exactly equal to zero (infinitesimals).
       
 (DIR) Post #B30qoslMa877PDXOue by bob_zim@infosec.exchange
       2026-02-05T13:41:06Z
       
       0 likes, 0 repeats
       
       @futurebird Fidlock’s SNAP 25 Electrified is *serious* overkill for keeping gloves together, but it’s a magneto-mechanical fastener with an integrated switch. They offer closed-when-fastened or open-when-fastened models, but the signal is only available on one side. Wouldn’t work for this directly, but could be a starting point for research. Here’s a datasheet for one line:https://www.fidlock.com/components/sites/default/files/Infosheet-F8043-SNAP-buckle-25-electrified.pdf