[HN Gopher] Image unshredding using a TSP solver
___________________________________________________________________
Image unshredding using a TSP solver
Author : ingve
Score : 141 points
Date : 2021-07-02 16:16 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| einpoklum wrote:
| The author's latest commit: @robinhouston The
| word "perfectly" was used too many times
|
| :-P
| magnimeelcul wrote:
| Cool read! Though ordering the columns such that the cumulative
| cobsecutive dissimilaritiea are minimized is not what you really
| want. Think of a shredded image of a checkerboard: with this
| objective you would not obtain the original but rather a version
| where there are contigous black/white steps on the left and on
| the right, with a discontinuity in the middle.
|
| That is to say: by reducing the objective to this you are
| assuming the image is continuous on the horizzontal axis, while
| in image processing images are usually supposed to be only
| piecewise continuous.
| stavros wrote:
| But there's no way to reconstruct a checkerboard image without
| knowing what it represented originally.
| magnimeelcul wrote:
| Without any a-priori I guess that's true, i.e. the problem
| they want to solve isn't guaranteed to have a solution
| ivanpondal wrote:
| I really liked the TSP approach when this came out. Turns out it
| works pretty well on text too:
| https://github.com/ivanpondal/text-unshredding
| sllabres wrote:
| This is quite similar to the early decryption of the analog
| Nagravision TV encryption system
| https://en.wikipedia.org/wiki/Nagravision worked. Did it myself
| at the time, but on to slow hardware/to little optimized
| software. The decryption took much to long to be usable, but it
| was fun to see that it worked quite fine
| WalterBright wrote:
| If you're going to use a shredder:
|
| 1. use a cross-cut shredder
|
| 2. mix many different shred jobs in the same bag
|
| 3. mix the contents of multiple bags
|
| 4. deposit different bags in different trash runs
|
| 5. just burn it
| jhncls wrote:
| Would this algorithm allow for an image version of the Burrows-
| Wheeler transform? So, sorting image rows and columns using a
| compressibility criterion.
|
| [0]
| https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf...
| hinkley wrote:
| I don't suppose there's a way to process paper into 'pellets' fit
| for recycling that make this whole process untenable.
|
| Paper fiber length dictates a lot about its reuse. The bigger the
| chunks the smaller the combinatorics for reversing the
| 'algorithm'. Big paper mache blobs might fare better. From a
| recycling standpoint a machine that tears the paper instead of
| cutting it would do better, but the uniqueness of the edges
| probably reduces the problem space.
|
| If someone invents a fast paper pulping machine that can mount on
| a small truck that doesn't require a commercial vehicle license,
| I bet you could sell loads of them.
|
| Whatever happened to that team who used microcavitation to blast
| the ink off of paper for reuse? Probably enough solvents left
| behind to read the text like invisible ink, but as a first pass
| it might be good.
| jdthedisciple wrote:
| What would be an exemplary use case for this? Like, when do you
| ever have images in front of you that are shuffled in this way?
| clipradiowallet wrote:
| Well, imagine someone dumpster diving at a
| bank/clinic/insurance provider that doesn't use a document
| destruction service. Probably plenty of PII/PHI could be
| reconstructed once the shredded fragments were singulated and
| scanned.
| IncRnd wrote:
| We don't shred digital images in RAM in that way, but we can
| scan & reconstruct physical images that have been shredded or
| torn.
|
| You can paste together possible scenes to construct side-
| scrolling game scenery in realtime.
|
| By applying a few insights we can phase one image into a hole
| that is in another image. This is similar to randomizing an
| array by creating a second array with random values, then
| sorting them in tandem, except we use the TSP with expected
| weightings.
|
| There are lots of use cases for this.
| est31 wrote:
| In the virtual RAM accessible to the process the image might
| be continuous, but its physical representation could be
| different, as the operating system might use different areas
| from the physical RAM to expose it to the process. If you
| dump a computer's RAM e.g. in a forensic setting, you might
| attain such a state.
|
| Same goes for file systems as well as the wear leveling layer
| of SSDs.
| IncRnd wrote:
| exactly!
| tylfin wrote:
| I could imagine a civil or criminal case relying on shredded
| documents as a source of evidence.
| klyrs wrote:
| When you dumpster dive your competitor looking for trade
| secrets?
| the8472 wrote:
| In the 90s and 00s analog pay tv used to be "encrypted" like
| this. IIRC software decryption relied on deriving the weak
| shuffling key by manually reordering image rows and then using
| that to reverse the key so the next frames could be decrypted
| more reliably. Doing that kind of video processing in realtime
| and in software was close to CPU limits.
|
| https://en.wikipedia.org/wiki/Nagravision
| aliasEli wrote:
| This might even have an application for real documents. There are
| police cases where some suspects have tried to destroy evidence.
| Manually reconstructing the original takes a lot of effort. I
| expect that the CIA and NSA already have software for it, but I
| have never heard of it being used in a police case.
| perihelions wrote:
| Germany has run a program for decades to recover the Stasi's
| paper documents, the bulk of which it shredded at the fall of
| the GDR. People can sometimes get their own Stasi files, if
| they're recovered.
|
| https://www.dw.com/en/how-ai-could-solve-a-600-million-piece...
|
| https://www.wired.com/2008/01/ff-stasi/
|
| https://boingboing.net/2018/01/09/truth-reconciliation-and-t...
|
| Surprisingly, there's never been an HN thread about this (as
| far as I can tell).
|
| From the _Wired_ link: _" Today, the study in Poppe's Berlin
| apartment is lined floor to 12-foot ceiling with bookshelves
| full of volumes on art, literature, and political science. But
| one shelf, just to the left of her desk, is special. It holds a
| pair of 3-inch-thick black binders -- copies of the most
| important documents in Poppe's secret police files. This is her
| Stasi shelf."_
| odysseus wrote:
| The painstaking process to put back together shredded Stasi
| documents is mentioned at the end of The Lives of Others:
| https://www.imdb.com/title/tt0405094/
|
| Great movie, by the way.
| einpoklum wrote:
| I really didn't like that movie, focusing on people who are
| wholly uninteresting politically; presenting a sort of a
| "shocked and chagrined" attitude to people being spied on
| by the government; and treating West Germany as some sort
| of paradise.
|
| I would recommend Ken Loach's Fatherland instead; but -
| only until when Dritterman arrives in the UK (the end of
| the film is kind of messed up).
| CamperBob2 wrote:
| The political class is more than capable of looking after
| itself, thanks. It's the common people who have the most
| to fear, and the movie does a great job at underscoring
| that very point.
| airstrike wrote:
| _ _Fantastic_ _ movie, I just wanted to add.
| ampdepolymerase wrote:
| Current office shredders are almost all capable of double
| shredding but considering how easy it is to reverse by OP's
| project, it may be necessary to implement further measures for
| document destruction.
|
| A laser or enzymic based method could work but neither are
| particularly office-friendly.
| the8472 wrote:
| https://a.pomf.cat/uayjgu.jpg
| ChrisMarshallNY wrote:
| I used to work for a defense contractor, in the mid-1980s.
|
| Once a week, this big-ass tandem truck would park outside the
| entrance, and the janitors would carry out boxes of papers,
| under armed guard.
|
| The truck had a huge shredder (crosscut), and also a big
| kiln.
|
| Once the paper was shredded, it was burned, and the ashes
| were shredded, just to be sure.
|
| Then, the truck would take the ashes away; never to be seen
| again.
| secondcoming wrote:
| Could someone invent a shredder that pre-analyses an image of
| what it's shredding and then shreds in a manner that makes
| reconstruction infeasible due to the increased likelihood of
| false positives? (e.g. the flipped and mirrored images in OP)
| jhayward wrote:
| OP's method relies on the permutation of elements in (columns
| or rows) to be in the same order for each (column or row).
|
| If you're just handed a pile of confetti you don't have that
| same correspondence.
| wongarsu wrote:
| Also, you generally have written correspondence, not
| images. I doubt that the similarity metric of OPs method
| produces reasonable results on text, and a good similarity
| metric for text would be much more complicated.
|
| On the other hand physical shredders typically have other
| weaknesses, for example if you don't stir the shredded
| result you have a high correlation between original the
| current and original relative position of the individual
| pieces.
| aliasEli wrote:
| With written text, you would expect that at least some
| the letters from one shred would continue on its
| neighbors. I don't see why the similarity metric would
| not work. Also normally you have a pretty high contrast
| between the letter and the background, unlike the image
| that was used in the example.
| jhgb wrote:
| If you have a language model, then sequences of
| characters appearing in your text will add further
| information to your edge matches.
| [deleted]
| jpalomaki wrote:
| DARPA ran a competition on this problem around 10 years ago [1]
| and page from Darpa archive [2]
|
| [1] https://en.wikipedia.org/wiki/DARPA_Shredder_Challenge_2011
| [2] https://archive.darpa.mil/shredderchallenge/index.html
| sgk284 wrote:
| This was a hiring puzzle from the Instagram team in 2012, from
| back before Facebook acquired them: https://instagram-
| engineering.com/instagram-engineering-chal...
|
| I remember having a lot of fun solving it, in Coffeescript (which
| was all the rage at the time), also with TSP but using the
| Pearson correlation coefficient as the similarity metric:
| https://gist.github.com/stevekrenzel/1505619
|
| Here's a little demo of it in action:
| http://krenzel.org/files/insta/
| speedgoose wrote:
| I like how unicorns can do such hiring challenges while normal
| companies are desperate for people to apply. I think my current
| company would hire anyone who can breath and write 3 lines of
| code per month, but no one fitting these criteria applies.
| onion2k wrote:
| I can write 4 lines of code per month, so I guess I'm
| overqualified.
| speedgoose wrote:
| Do they work ?
| reificator wrote:
| Oh, now we're moving the goalposts eh?
| malux85 wrote:
| On my machine, yes
| kevin_thibedeau wrote:
| They've got +3 on StackOverflow so fully peer reviewed.
| satya71 wrote:
| Are you hiring on-site or remote? Which country?
| gccs wrote:
| There is a shortage of engineers because companies don't want
| to pay them what their worth.
| hn_throwaway_99 wrote:
| This gets spouted all the time on HN, and I find it rarely
| to be true, or at least a gross oversimplification.
|
| In my experience what usually happens is a programmer's
| "worth" is hugely dependent on the scale of the
| organization, moreso than nearly every other individual
| contributor-level role, which means that smaller orgs have
| an extremely difficult time competing with larger orgs.
|
| For example, suppose at some company there is a business
| process that generates $1 billion in annual revenue, and a
| programmer has implemented something that can increase that
| by, say, .2%. So the programmer's "worth" in that situation
| is 2 million dollars.
|
| A much smaller company only does $10 million in annual
| revenue, so the commensurate .2% improvement is only worth
| 20k.
|
| This is a gross oversimplification, of course, but it
| really gets to the heart of way the FAANGS can pay such
| huge salaries and suck up a lot of the best talent. It's
| not that all the other companies are being stingy, it's
| just that in many cases they _can 't_ pay a programmer
| anywhere near what a FAANG can because that programmer just
| can't generate that much business value at a smaller
| company.
| WalterBright wrote:
| The only reason there's a shortage of housing is because
| people don't want to pay what housing is worth.
| michaelcampbell wrote:
| What's the work, and the pay? Those 2 things will play big
| roles more than a dearth of candidates. Told my son for years
| if you want to make more money, find something to do that no
| else wants to. Turns out the same is on the hiring side.
| speedgoose wrote:
| Very average IT work in healt-care. Glorified CRUD plus
| some. Pay is average too.
| infinite8s wrote:
| What does average mean in this case?
| porphyra wrote:
| I think public hiring challenges like this are less of a
| barrier that filters out candidates, and more of an attractor
| that increases the number of applicants by "nerd sniping"
| people.
| speedgoose wrote:
| You need a strong brand and good marketing to attract
| people with a challenge. A normal company has difficulties
| to reach people.
| ezekg wrote:
| Reminds me how much I loved CoffeeScript. The code really is
| beautiful.
| nayuki wrote:
| Discussions back in 2016:
| https://news.ycombinator.com/item?id=12670997 ;
| https://news.ycombinator.com/item?id=12667611
| whazor wrote:
| I used TSP to optimally stack different types of bowls. I
| measured all the different pairs of bowls and searched for the
| shortest route through all the bowls we have. I found that you
| can split the stacks however you want, I also had an algorithm
| for that.
| Conlectus wrote:
| Is the algorithm different from a simple sorted order in this
| case? The difficulty with TSP is that you have to return to the
| start, which wouldn't seek to be the case with stacled bowls,
| unless you're stacking them in a ring somehow.
| SavantIdiot wrote:
| I wrote a program to do this two decades ago. I shredded a black
| and white text document (long shreds, not the new shredders that
| cut into confetti), scanned all the strips, front and back, and
| created a lookup value for each edge based on the pixel position.
| The biggest problem I had was the shredder blade removed too much
| material. This caused a continuous stroke to be fatter on one
| side than the other (imagine if the descender in a lowercase "p"
| was on the blade's path).
| nielsbot wrote:
| I wonder if a high resolution photo of all the shreds
| (flattened out) would also work and be easier?
| hackcasual wrote:
| DARPA had a challenge for this 10 years ago
| https://gizmodo.com/darpas-almost-impossible-challenge-to-re...
| SavantIdiot wrote:
| That's awesome! I had no idea it existed. Too bad the top
| level URL no longer exists, but now I'm eager to see if there
| are solutions online today.
___________________________________________________________________
(page generated 2021-07-02 23:00 UTC)