[HN Gopher] Deprank: Use PageRank to find the most important fil...
       ___________________________________________________________________
        
       Deprank: Use PageRank to find the most important files in your
       codebase
        
       Author : phpnode
       Score  : 148 points
       Date   : 2022-07-01 09:49 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | londons_explore wrote:
       | I can imagine that "utils.js" and "math.h" rank highest, while
       | "main.c" will probably have the lowest rank.
       | 
       | Doesn't sound like the ranking metric I'd want for code search
       | results...
        
         | rullelito wrote:
         | I can imagine that if you reverse all dep directions and do
         | page rank on that as well, you can create a better ranking by
         | calculating max(rank, revrank)
        
         | charcircuit wrote:
         | It seems fine for me. When you search for a function definition
         | it will tend to show you the most commonly used one.
        
       | dan-dev wrote:
       | Great project!
       | 
       | One feature request: Running the npx command searched only for
       | the js files, not for the ts files. When I built deprank locally
       | with yarn, it also showed the ts files. After looking at
       | dependency-cruiser figure it has to do with what typescript
       | compilers are available where.
       | 
       | It would be great if the npx command you provide in your readme
       | would work regardless of my local setup - dependency-cruiser has
       | documentation and one example of a suitable npx command here:
       | https://github.com/sverweij/dependency-cruiser/blob/develop/...
       | 
       | My suggestion would be to check if any ts file is part of the
       | extension option (i.e. --ext=".js,.jsx, .ts, .tsx") and only then
       | do the magic needed to also show ts files.
        
         | phpnode wrote:
         | thanks for this comment, I'll look into fixing it. In the
         | meantime I'm curious whether it works when you install deprank
         | directly? e.g.                   yarn add deprank         yarn
         | run deprank ./src
        
           | dan-dev wrote:
           | Thank you! :)
           | 
           | It worked when installing inside the project folder and did
           | not work when installing outside the project folder:
           | // Success: Running in the project folder.
           | me:~/deprank$ yarn add deprank       yarn add v1.22.19
           | warning ../package.json: No license field       [1/5]
           | Validating package.json...       [2/5] Resolving packages...
           | [3/5] Fetching packages...       [4/5] Linking
           | dependencies...       [5/5] Building fresh packages...
           | success Saved lockfile.       success Saved 1 new dependency.
           | info Direct dependencies       +- deprank@0.1.1       info
           | All dependencies       +- deprank@0.1.1       $ tsdx build
           | @rollup/plugin-replace: 'preventAssignment' currently
           | defaults to false. It is recommended to set this option to
           | `true`, as the next major version will default this option to
           | `true`.       @rollup/plugin-replace: 'preventAssignment'
           | currently defaults to false. It is recommended to set this
           | option to `true`, as the next major version will default this
           | option to `true`.        Creating entry file 602 ms
           | Building modules 1.4 secs       Done in 5.95s.
           | me:~/deprank$ yarn run deprank src/       yarn run v1.22.19
           | warning ../package.json: No license field       $
           | /home/dan/deprank/node_modules/.bin/deprank src/       |
           | Filename     | Lines | Dependents | PageRank |
           | ------------------------------------------------       |
           | src/index.ts | 280   | 0          | 1.000000 |       Done in
           | 0.55s.       me:~/deprank$ yarn run deprank .       yarn run
           | v1.22.19       warning ../package.json: No license field
           | $ /home/me/deprank/node_modules/.bin/deprank .       |
           | Filename                           | Lines | Dependents |
           | PageRank        ---------------------------------------------
           | -------------------------       | fixtures/core.js
           | | 3     | 1          | 0.191112 |       | fixtures/utils.js
           | | 4     | 3          | 0.180576 |       |
           | fixtures/user/user.js              | 4     | 1          |
           | 0.088966 |       | src/index.ts                       | 280
           | | 1          | 0.069599 |       | fixtures/todo.js
           | | 6     | 1          | 0.060405 |       |
           | fixtures/user/index.js             | 1     | 1          |
           | 0.060405 |       | dist/deprank.cjs.development.js    | 829
           | | 1          | 0.053610 |       |
           | dist/deprank.cjs.production.min.js | 2     | 1          |
           | 0.053610 |       | fixtures/concepts.js               | 4
           | | 1          | 0.053610 |       | dist/deprank.esm.js
           | | 820   | 0          | 0.037621 |       | dist/index.d.ts
           | | 36    | 0          | 0.037621 |       | dist/index.js
           | | 8     | 0          | 0.037621 |       | fixtures/index.js
           | | 4     | 0          | 0.037621 |       |
           | test/deprank.test.ts               | 28    | 0          |
           | 0.037621 |       Done in 0.60s.            ------------------
           | -------------------------------------------------------------
           | -----------------------------------------------------
           | // Failure: Running outside of the project folder:
           | me:~$ yarn add deprank       yarn add v1.22.19       warning
           | package.json: No license field       warning package-
           | lock.json found. Your project contains lock files generated
           | by tools other than Yarn. It is advised not to mix package
           | managers in order to avoid resolution inconsistencies caused
           | by unsynchronized lock files. To clear this warning, remove
           | package-lock.json.       warning No license field       [1/4]
           | Resolving packages...       [2/4] Fetching packages...
           | [3/4] Linking dependencies...       [4/4] Building fresh
           | packages...       success Saved lockfile.       warning No
           | license field       success Saved 3 new dependencies.
           | info Direct dependencies       +- deprank@0.1.1       +-
           | node@18.4.0       info All dependencies       +-
           | deprank@0.1.1       +- node-bin-setup@1.1.0       +-
           | node@18.4.0       Done in 4.95s.            me:~$ yarn run
           | deprank deprank/       yarn run v1.22.19       warning
           | package.json: No license field       $
           | /home/me/node_modules/.bin/deprank deprank/       | Filename
           | | Lines | Dependents | PageRank |       ---------------------
           | ---------------------------------------------------------
           | | deprank/fixtures/core.js                   | 3     | 1
           | | 0.223479 |       | deprank/fixtures/utils.js
           | | 4     | 3          | 0.211161 |       |
           | deprank/fixtures/user/user.js              | 4     | 1
           | | 0.104035 |       | deprank/fixtures/todo.js
           | | 6     | 1          | 0.070637 |       |
           | deprank/fixtures/user/index.js             | 1     | 1
           | | 0.070637 |       | deprank/dist/deprank.cjs.development.js
           | | 829   | 1          | 0.062691 |       |
           | deprank/dist/deprank.cjs.production.min.js | 2     | 1
           | | 0.062691 |       | deprank/fixtures/concepts.js
           | | 4     | 1          | 0.062691 |       |
           | deprank/dist/deprank.esm.js                | 820   | 0
           | | 0.043993 |       | deprank/dist/index.js
           | | 8     | 0          | 0.043993 |       |
           | deprank/fixtures/index.js                  | 4     | 0
           | | 0.043993 |       Done in 0.28s.
        
       | xcambar wrote:
       | Off-topic: I've read the name as if it were "de-prank".
        
         | krylon wrote:
         | As did I, resulting in slight disappointment when I eventually
         | figured out what this was actually about. There sure are some
         | code bases out there that could use a bit of de-pranking. ;-)
        
         | einszwei wrote:
         | I read it as "deep-rank" but I guess the intended reading is
         | "dep-rank".
        
           | pindab0ter wrote:
           | Out of curiousity: How do you reed "dep" with one 'e' as
           | "deep"?
        
             | manfre wrote:
             | Humn brain is grat at patrn matching nd haz auto correct.
        
             | voidfunc wrote:
             | I also read it as deeprank
        
             | tessierashpool wrote:
             | deprogram
             | 
             | depart
             | 
             | depend
             | 
             | deploy
             | 
             | deport
             | 
             | depose
             | 
             | deprive
             | 
             | depraved
             | 
             | depressurize
             | 
             | depress
             | 
             | depopulate
             | 
             | depoliticize
             | 
             | deposit
        
       | IvyMike wrote:
       | I've always wanted to see my reddit (and I guess HN) karma
       | evaluated as my pagerank of comments. (Yes, lots of downsides--
       | people with high karma have more "power". Yes it would be
       | instantly gamed. I still want to see it.)
       | 
       | It would also be neat to see a reddit-like website with multiple
       | formulas for "karma" all evaluated at the same time, like IVYMIKE
       | (PageRank:1234 Classic: 1656 Experimental: 78)
        
       | bee_rider wrote:
       | It would be sort of funny to apply some reordering algorithms to
       | the dependency graph, and use that to refactor a project. Maybe
       | nested dissection. Or some fancy hypergraph reordering...
        
       | zeroth32 wrote:
       | It would be very usefull as intellij idea plugin, it could hook
       | up to Idea AST and work in any language. And not just files, but
       | methods. And it could be contextual depending on edit history and
       | current context.
       | 
       | Probably great tool for quick start on new project.
        
       | pbackup12345 wrote:
       | Well, technically this is a godsent. However when trying to run
       | on some of my projects both javascript and my typescript it comes
       | up totally empty as of now. Bug?
        
         | phpnode wrote:
         | hmm, sounds like it, please could you open a github issue with
         | some more info?
         | 
         | edit: I've just fixed an issue that looks similar to this,
         | please run with `npx deprank@0.1.1 ./path/to/folder` and let me
         | know if you're still having problems
        
           | pbackup12345 wrote:
           | Cool. That was it. Works now. Hats of to you. This has more
           | uses than just typescripting. Actually you can spot in React
           | apps, probably in other apps, too, the dependencies which
           | break your code splitting.
        
       | toxik wrote:
       | This becomes impossible to do correctly because of the halting
       | problem, interestingly. For example, suppose a routine calls F in
       | a loop for most of its work, then at the end takes the square
       | root by sqrtf(). Clearly number of calls matters for the edge
       | weight in the call graph, but this tool would count F and sqrtf
       | equal.
       | 
       | I suppose you could do it by sampling, then you actually just
       | have to look at the sample distribution, though that would show
       | you the graph weighted by cumulative execution times per routine.
       | 
       | As they say though, never let perfect be the enemy of good. Neat
       | idea.
        
         | lpapez wrote:
         | I think you are mixing performance analysis with dependency
         | analysis which is the point of the project. The sampling you
         | are describing is commonly done by tools called "sampling
         | profilers".
        
         | enneff wrote:
         | This actually has nothing to do with how often the code is
         | executed. Code is ranked by its referents.
        
           | toxik wrote:
           | Right, which is exactly what I pointed out. The true call
           | graph can only be obtained by basically running the program.
           | 
           | Counting references is clearly a compromise here. To see its
           | drawbacks, consider indirection and dynamism.
        
             | naniwaduni wrote:
             | This reaction is kind of like reading about a pre-election
             | poll where the participants were selected by pulling them
             | off 86th street at 11 am, and objecting that the
             | percentages shouldn't be presented with more than two sig
             | figs with a sample size of 100.
             | 
             | You're not, strictly speaking, wrong. But the methodology
             | is already known to be deeply compromised, so your
             | objection is kind of out there.
        
               | toxik wrote:
               | It was an observation that the choice to use references
               | is actually a must, because of the halting problem, which
               | I found interesting.
               | 
               | Not everything is an argumentation.
        
       | rajnathani wrote:
       | This should be a "Show HN", you'll likely get more coverage too.
       | 
       | Question: Which package do you use for the linear algebra
       | calculation? I couldn't figure it out by skimming through your
       | source code.
        
         | charcircuit wrote:
         | >Which package do you use for the linear algebra calculation?
         | 
         | He doesn't.
         | https://github.com/codemix/deprank/blob/main/src/index.ts#L1...
        
       | zetalemur wrote:
       | That's actually a nice idea. I guess we will see more software in
       | the source code dependencies analysis space. There's so much code
       | and often it's nice to have some kind of metrics (LOC, PageRank,
       | ...) to get a grasp of what's important in a codebase.
        
       | lampshades wrote:
       | Just started a new job and having a filetype agnostic version of
       | this would be immensely helpful for learning the codebase.
        
       | bjornsing wrote:
       | > We define importance as those files which are directly or
       | indirectly depended upon the most by other files in the codebase.
       | 
       | I honestly would have expected the opposite definition. Maybe I'm
       | kind of old school, but in a well-architected large c-program for
       | instance, "main.c" tends to depend (directly or indirectly) on
       | every other compile unit, while there are no dependencies in the
       | other direction. And I think "main.c" should be seen as
       | "important".
       | 
       | Why would this not be true for JavaScript or typescript?
        
         | noveltyaccount wrote:
         | I think I'm reading this opposite from you.
         | 
         | `main.c` _depends upon_ everything else, but is not _depended
         | upon_ by anything else. A file like `datatypes.c` might be
         | depended upon by multiple tiers of the application and be
         | referenced by dozens of files, making it have a high pagerank.
        
           | mike_hock wrote:
           | Might be useful to navigate an unknown codebase. If you know
           | the codebase already, you should know what forms the core
           | that everything else depends upon.
           | 
           | The term "importance" might also be slightly misleading.
           | What's more important, the engine or the transmission? The
           | car needs both to drive.
        
         | Jensson wrote:
         | If you are looking for a utility function then it will likely
         | be found in a file that is imported in many other places. So
         | intuitively this would be a reasonable way to rank files/search
         | results when you are looking for something to import.
        
         | atwood22 wrote:
         | Another example is interface usage. You may have a commonly
         | used interface with only one implementation. The implementation
         | is the important part, but almost no usage depends directly on
         | the implementation.
        
           | yodon wrote:
           | If you are looking to introduce types into an untyped
           | codebase, as this project talks about, then you probably
           | don't have a lot of interfaces defined, and if you do have
           | things in your untyped codebase that are loosely analogous to
           | interfaces from a decoupling standpoint (such as facades and
           | factories) then this approach would advocate those are
           | sensible areas to focus your initial typing efforts.
        
         | dack wrote:
         | they mentioned using it to add types to a javascript project. i
         | can see why you'd start by adding types to the files that are
         | most used first
        
         | CapsAdmin wrote:
         | I agree, but seen through the lens of wanting to clean up a
         | codebase this seems useful. It's stated in the second paragraph
         | on github that it's specifically useful for converting
         | javascript to typescript as well.
         | 
         | If you have a messy large c-program with the intention of
         | cleaning it up, would you say main.c is the most important
         | file?
         | 
         | I think it's an interesting idea that could work on top of this
         | utility to give you an overview of how messy the dependency
         | graph of a project is.
        
           | ape4 wrote:
           | If lots of functions depend on main.c then you have a problem
           | ;)
        
       | ape4 wrote:
       | Perhaps this could be paired with a static analysis of the code's
       | quality. Then you'd get the most used code with the worst
       | quality.
        
       | avivo wrote:
       | Fascinating. "Deprank is particularly useful when converting an
       | existing JavaScript codebase to TypeScript. Performing the
       | conversion in strict PageRank order can dramatically increase
       | type-precision, reduce the need for any and minimizes the amount
       | of rework that is usually inherent in converting large
       | codebases." I wonder if the idea of using pagerank style systems
       | for ~refactoring translate to other domain; e.g. organizational
       | or knowledge refactoring (ala https://aviv.medium.com/when-we-
       | change-the-efficiency-of-kno... )
        
       | hamasho wrote:
       | Great project! I can imagine this helps me get used to new
       | codebases.
       | 
       | As others have already mentioned, it's better if it ranks
       | important files like main.js higher than other util files.
       | models.js and dataUtils.js can be imported from 20 files, but if
       | dataUtils.js is created at the beginning of the project and
       | barely touched since then, it should be ranked lower.
       | 
       | I think it's nice to rank frequently updated files higher, just
       | like real page rank algorithms. If models.js is imported from
       | many files and updated frequently, it should be ranked at the
       | top. main.js is imported nowhere but updated frequently; it
       | should also be higher. dataUtils.js is imported from many files
       | but has not been updated for a long time; it should be lower.
       | 
       | It's much more complicated than normal static analysis. Source
       | code is no longer the single source of truth. VCS' history should
       | also be considered. It has to manage those SEO scammers too. We
       | know those who commit again and again like "Update, Fix bug, Fix
       | typo, Fix lint, Update, Fix format, ..." instead of a single
       | meaningful commit message.
       | 
       | But if it's implemented properly, it helps explore unfamiliar
       | codebases much faster.
        
       | dzdt wrote:
       | My favorite approach for finding important files is to look at
       | which files have the most number of changes in the source
       | control. At least with Perforce this is very easy. The files that
       | have many changes are ones where important logic happens. Ones
       | that don't change much are boilerplate, low level object
       | definitions, etc.
        
       ___________________________________________________________________
       (page generated 2022-07-02 23:02 UTC)