[HN Gopher] Deprank: Use PageRank to find the most important fil...
___________________________________________________________________
Deprank: Use PageRank to find the most important files in your
codebase
Author : phpnode
Score : 148 points
Date : 2022-07-01 09:49 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| londons_explore wrote:
| I can imagine that "utils.js" and "math.h" rank highest, while
| "main.c" will probably have the lowest rank.
|
| Doesn't sound like the ranking metric I'd want for code search
| results...
| rullelito wrote:
| I can imagine that if you reverse all dep directions and do
| page rank on that as well, you can create a better ranking by
| calculating max(rank, revrank)
| charcircuit wrote:
| It seems fine for me. When you search for a function definition
| it will tend to show you the most commonly used one.
| dan-dev wrote:
| Great project!
|
| One feature request: Running the npx command searched only for
| the js files, not for the ts files. When I built deprank locally
| with yarn, it also showed the ts files. After looking at
| dependency-cruiser figure it has to do with what typescript
| compilers are available where.
|
| It would be great if the npx command you provide in your readme
| would work regardless of my local setup - dependency-cruiser has
| documentation and one example of a suitable npx command here:
| https://github.com/sverweij/dependency-cruiser/blob/develop/...
|
| My suggestion would be to check if any ts file is part of the
| extension option (i.e. --ext=".js,.jsx, .ts, .tsx") and only then
| do the magic needed to also show ts files.
| phpnode wrote:
| thanks for this comment, I'll look into fixing it. In the
| meantime I'm curious whether it works when you install deprank
| directly? e.g. yarn add deprank yarn
| run deprank ./src
| dan-dev wrote:
| Thank you! :)
|
| It worked when installing inside the project folder and did
| not work when installing outside the project folder:
| // Success: Running in the project folder.
| me:~/deprank$ yarn add deprank yarn add v1.22.19
| warning ../package.json: No license field [1/5]
| Validating package.json... [2/5] Resolving packages...
| [3/5] Fetching packages... [4/5] Linking
| dependencies... [5/5] Building fresh packages...
| success Saved lockfile. success Saved 1 new dependency.
| info Direct dependencies +- deprank@0.1.1 info
| All dependencies +- deprank@0.1.1 $ tsdx build
| @rollup/plugin-replace: 'preventAssignment' currently
| defaults to false. It is recommended to set this option to
| `true`, as the next major version will default this option to
| `true`. @rollup/plugin-replace: 'preventAssignment'
| currently defaults to false. It is recommended to set this
| option to `true`, as the next major version will default this
| option to `true`. Creating entry file 602 ms
| Building modules 1.4 secs Done in 5.95s.
| me:~/deprank$ yarn run deprank src/ yarn run v1.22.19
| warning ../package.json: No license field $
| /home/dan/deprank/node_modules/.bin/deprank src/ |
| Filename | Lines | Dependents | PageRank |
| ------------------------------------------------ |
| src/index.ts | 280 | 0 | 1.000000 | Done in
| 0.55s. me:~/deprank$ yarn run deprank . yarn run
| v1.22.19 warning ../package.json: No license field
| $ /home/me/deprank/node_modules/.bin/deprank . |
| Filename | Lines | Dependents |
| PageRank ---------------------------------------------
| ------------------------- | fixtures/core.js
| | 3 | 1 | 0.191112 | | fixtures/utils.js
| | 4 | 3 | 0.180576 | |
| fixtures/user/user.js | 4 | 1 |
| 0.088966 | | src/index.ts | 280
| | 1 | 0.069599 | | fixtures/todo.js
| | 6 | 1 | 0.060405 | |
| fixtures/user/index.js | 1 | 1 |
| 0.060405 | | dist/deprank.cjs.development.js | 829
| | 1 | 0.053610 | |
| dist/deprank.cjs.production.min.js | 2 | 1 |
| 0.053610 | | fixtures/concepts.js | 4
| | 1 | 0.053610 | | dist/deprank.esm.js
| | 820 | 0 | 0.037621 | | dist/index.d.ts
| | 36 | 0 | 0.037621 | | dist/index.js
| | 8 | 0 | 0.037621 | | fixtures/index.js
| | 4 | 0 | 0.037621 | |
| test/deprank.test.ts | 28 | 0 |
| 0.037621 | Done in 0.60s. ------------------
| -------------------------------------------------------------
| -----------------------------------------------------
| // Failure: Running outside of the project folder:
| me:~$ yarn add deprank yarn add v1.22.19 warning
| package.json: No license field warning package-
| lock.json found. Your project contains lock files generated
| by tools other than Yarn. It is advised not to mix package
| managers in order to avoid resolution inconsistencies caused
| by unsynchronized lock files. To clear this warning, remove
| package-lock.json. warning No license field [1/4]
| Resolving packages... [2/4] Fetching packages...
| [3/4] Linking dependencies... [4/4] Building fresh
| packages... success Saved lockfile. warning No
| license field success Saved 3 new dependencies.
| info Direct dependencies +- deprank@0.1.1 +-
| node@18.4.0 info All dependencies +-
| deprank@0.1.1 +- node-bin-setup@1.1.0 +-
| node@18.4.0 Done in 4.95s. me:~$ yarn run
| deprank deprank/ yarn run v1.22.19 warning
| package.json: No license field $
| /home/me/node_modules/.bin/deprank deprank/ | Filename
| | Lines | Dependents | PageRank | ---------------------
| ---------------------------------------------------------
| | deprank/fixtures/core.js | 3 | 1
| | 0.223479 | | deprank/fixtures/utils.js
| | 4 | 3 | 0.211161 | |
| deprank/fixtures/user/user.js | 4 | 1
| | 0.104035 | | deprank/fixtures/todo.js
| | 6 | 1 | 0.070637 | |
| deprank/fixtures/user/index.js | 1 | 1
| | 0.070637 | | deprank/dist/deprank.cjs.development.js
| | 829 | 1 | 0.062691 | |
| deprank/dist/deprank.cjs.production.min.js | 2 | 1
| | 0.062691 | | deprank/fixtures/concepts.js
| | 4 | 1 | 0.062691 | |
| deprank/dist/deprank.esm.js | 820 | 0
| | 0.043993 | | deprank/dist/index.js
| | 8 | 0 | 0.043993 | |
| deprank/fixtures/index.js | 4 | 0
| | 0.043993 | Done in 0.28s.
| xcambar wrote:
| Off-topic: I've read the name as if it were "de-prank".
| krylon wrote:
| As did I, resulting in slight disappointment when I eventually
| figured out what this was actually about. There sure are some
| code bases out there that could use a bit of de-pranking. ;-)
| einszwei wrote:
| I read it as "deep-rank" but I guess the intended reading is
| "dep-rank".
| pindab0ter wrote:
| Out of curiousity: How do you reed "dep" with one 'e' as
| "deep"?
| manfre wrote:
| Humn brain is grat at patrn matching nd haz auto correct.
| voidfunc wrote:
| I also read it as deeprank
| tessierashpool wrote:
| deprogram
|
| depart
|
| depend
|
| deploy
|
| deport
|
| depose
|
| deprive
|
| depraved
|
| depressurize
|
| depress
|
| depopulate
|
| depoliticize
|
| deposit
| IvyMike wrote:
| I've always wanted to see my reddit (and I guess HN) karma
| evaluated as my pagerank of comments. (Yes, lots of downsides--
| people with high karma have more "power". Yes it would be
| instantly gamed. I still want to see it.)
|
| It would also be neat to see a reddit-like website with multiple
| formulas for "karma" all evaluated at the same time, like IVYMIKE
| (PageRank:1234 Classic: 1656 Experimental: 78)
| bee_rider wrote:
| It would be sort of funny to apply some reordering algorithms to
| the dependency graph, and use that to refactor a project. Maybe
| nested dissection. Or some fancy hypergraph reordering...
| zeroth32 wrote:
| It would be very usefull as intellij idea plugin, it could hook
| up to Idea AST and work in any language. And not just files, but
| methods. And it could be contextual depending on edit history and
| current context.
|
| Probably great tool for quick start on new project.
| pbackup12345 wrote:
| Well, technically this is a godsent. However when trying to run
| on some of my projects both javascript and my typescript it comes
| up totally empty as of now. Bug?
| phpnode wrote:
| hmm, sounds like it, please could you open a github issue with
| some more info?
|
| edit: I've just fixed an issue that looks similar to this,
| please run with `npx deprank@0.1.1 ./path/to/folder` and let me
| know if you're still having problems
| pbackup12345 wrote:
| Cool. That was it. Works now. Hats of to you. This has more
| uses than just typescripting. Actually you can spot in React
| apps, probably in other apps, too, the dependencies which
| break your code splitting.
| toxik wrote:
| This becomes impossible to do correctly because of the halting
| problem, interestingly. For example, suppose a routine calls F in
| a loop for most of its work, then at the end takes the square
| root by sqrtf(). Clearly number of calls matters for the edge
| weight in the call graph, but this tool would count F and sqrtf
| equal.
|
| I suppose you could do it by sampling, then you actually just
| have to look at the sample distribution, though that would show
| you the graph weighted by cumulative execution times per routine.
|
| As they say though, never let perfect be the enemy of good. Neat
| idea.
| lpapez wrote:
| I think you are mixing performance analysis with dependency
| analysis which is the point of the project. The sampling you
| are describing is commonly done by tools called "sampling
| profilers".
| enneff wrote:
| This actually has nothing to do with how often the code is
| executed. Code is ranked by its referents.
| toxik wrote:
| Right, which is exactly what I pointed out. The true call
| graph can only be obtained by basically running the program.
|
| Counting references is clearly a compromise here. To see its
| drawbacks, consider indirection and dynamism.
| naniwaduni wrote:
| This reaction is kind of like reading about a pre-election
| poll where the participants were selected by pulling them
| off 86th street at 11 am, and objecting that the
| percentages shouldn't be presented with more than two sig
| figs with a sample size of 100.
|
| You're not, strictly speaking, wrong. But the methodology
| is already known to be deeply compromised, so your
| objection is kind of out there.
| toxik wrote:
| It was an observation that the choice to use references
| is actually a must, because of the halting problem, which
| I found interesting.
|
| Not everything is an argumentation.
| rajnathani wrote:
| This should be a "Show HN", you'll likely get more coverage too.
|
| Question: Which package do you use for the linear algebra
| calculation? I couldn't figure it out by skimming through your
| source code.
| charcircuit wrote:
| >Which package do you use for the linear algebra calculation?
|
| He doesn't.
| https://github.com/codemix/deprank/blob/main/src/index.ts#L1...
| zetalemur wrote:
| That's actually a nice idea. I guess we will see more software in
| the source code dependencies analysis space. There's so much code
| and often it's nice to have some kind of metrics (LOC, PageRank,
| ...) to get a grasp of what's important in a codebase.
| lampshades wrote:
| Just started a new job and having a filetype agnostic version of
| this would be immensely helpful for learning the codebase.
| bjornsing wrote:
| > We define importance as those files which are directly or
| indirectly depended upon the most by other files in the codebase.
|
| I honestly would have expected the opposite definition. Maybe I'm
| kind of old school, but in a well-architected large c-program for
| instance, "main.c" tends to depend (directly or indirectly) on
| every other compile unit, while there are no dependencies in the
| other direction. And I think "main.c" should be seen as
| "important".
|
| Why would this not be true for JavaScript or typescript?
| noveltyaccount wrote:
| I think I'm reading this opposite from you.
|
| `main.c` _depends upon_ everything else, but is not _depended
| upon_ by anything else. A file like `datatypes.c` might be
| depended upon by multiple tiers of the application and be
| referenced by dozens of files, making it have a high pagerank.
| mike_hock wrote:
| Might be useful to navigate an unknown codebase. If you know
| the codebase already, you should know what forms the core
| that everything else depends upon.
|
| The term "importance" might also be slightly misleading.
| What's more important, the engine or the transmission? The
| car needs both to drive.
| Jensson wrote:
| If you are looking for a utility function then it will likely
| be found in a file that is imported in many other places. So
| intuitively this would be a reasonable way to rank files/search
| results when you are looking for something to import.
| atwood22 wrote:
| Another example is interface usage. You may have a commonly
| used interface with only one implementation. The implementation
| is the important part, but almost no usage depends directly on
| the implementation.
| yodon wrote:
| If you are looking to introduce types into an untyped
| codebase, as this project talks about, then you probably
| don't have a lot of interfaces defined, and if you do have
| things in your untyped codebase that are loosely analogous to
| interfaces from a decoupling standpoint (such as facades and
| factories) then this approach would advocate those are
| sensible areas to focus your initial typing efforts.
| dack wrote:
| they mentioned using it to add types to a javascript project. i
| can see why you'd start by adding types to the files that are
| most used first
| CapsAdmin wrote:
| I agree, but seen through the lens of wanting to clean up a
| codebase this seems useful. It's stated in the second paragraph
| on github that it's specifically useful for converting
| javascript to typescript as well.
|
| If you have a messy large c-program with the intention of
| cleaning it up, would you say main.c is the most important
| file?
|
| I think it's an interesting idea that could work on top of this
| utility to give you an overview of how messy the dependency
| graph of a project is.
| ape4 wrote:
| If lots of functions depend on main.c then you have a problem
| ;)
| ape4 wrote:
| Perhaps this could be paired with a static analysis of the code's
| quality. Then you'd get the most used code with the worst
| quality.
| avivo wrote:
| Fascinating. "Deprank is particularly useful when converting an
| existing JavaScript codebase to TypeScript. Performing the
| conversion in strict PageRank order can dramatically increase
| type-precision, reduce the need for any and minimizes the amount
| of rework that is usually inherent in converting large
| codebases." I wonder if the idea of using pagerank style systems
| for ~refactoring translate to other domain; e.g. organizational
| or knowledge refactoring (ala https://aviv.medium.com/when-we-
| change-the-efficiency-of-kno... )
| hamasho wrote:
| Great project! I can imagine this helps me get used to new
| codebases.
|
| As others have already mentioned, it's better if it ranks
| important files like main.js higher than other util files.
| models.js and dataUtils.js can be imported from 20 files, but if
| dataUtils.js is created at the beginning of the project and
| barely touched since then, it should be ranked lower.
|
| I think it's nice to rank frequently updated files higher, just
| like real page rank algorithms. If models.js is imported from
| many files and updated frequently, it should be ranked at the
| top. main.js is imported nowhere but updated frequently; it
| should also be higher. dataUtils.js is imported from many files
| but has not been updated for a long time; it should be lower.
|
| It's much more complicated than normal static analysis. Source
| code is no longer the single source of truth. VCS' history should
| also be considered. It has to manage those SEO scammers too. We
| know those who commit again and again like "Update, Fix bug, Fix
| typo, Fix lint, Update, Fix format, ..." instead of a single
| meaningful commit message.
|
| But if it's implemented properly, it helps explore unfamiliar
| codebases much faster.
| dzdt wrote:
| My favorite approach for finding important files is to look at
| which files have the most number of changes in the source
| control. At least with Perforce this is very easy. The files that
| have many changes are ones where important logic happens. Ones
| that don't change much are boilerplate, low level object
| definitions, etc.
___________________________________________________________________
(page generated 2022-07-02 23:02 UTC)