[HN Gopher] Show HN: Credit reports about German companies
___________________________________________________________________
Show HN: Credit reports about German companies
Hello, In addition to my studies in computer science, I have been
working on a side project. I obtain data from the
Unternehmensregister, a register where every German limited company
is required to publish their financial statements. These statements
are published as HTML files and are completely unstructured. While
financial statements often look similar, companies are not required
to follow a specific structure, which often leads to inconsistently
formatted statements. The use of the Unternehmensregister is
completely free, so you can check out some examples. I wrote code
that converts the unstructured financial statements into structured
data using the ChatGPT API. This works well. Of course, there are
some problems that have not yet been solved, but data extraction
works well for the majority of companies. I than coded a Random
Forest algorithm to estimate the probability of default for a
company based on its financial statement from the
Unternehmensregister. I built a website to present the structured
data along with the scores. Essentially, I create a credit reports
for companies. Currently, there are four companies in Germany that
also create credit reports (Schufa, Creditreform, Crif, and
Creditsafe). Other companies resell the data from these four
providers. I provide the same services as these companies, but
without including personal information such as directors or
investors. The market for this service is quite large; for example,
Creditreform sold over 26 million credit reports about companies in
2020. My probability of default prediction performs quite well,
achieving an AUC score of 0.87 on my test data. An AUC of 0.87
means that there is an 87% chance that the model ranks a randomly
selected company that defaults higher than a randomly selected
company that does not default. Additionally, there are many more
companies to crawl for my database. Currently, I am focusing on
companies that are required to publish their profit and loss
statements. For testing purposes, there are currently 2,000
companies available on my website. At the moment, the website is
only available in German, but you can use Google Translate, which
works ok for my website. Thank you very much for your feedback!
Author : gab_
Score : 34 points
Date : 2024-12-12 16:29 UTC (6 hours ago)
(HTM) web link (bonscore.org)
(TXT) w3m dump (bonscore.org)
| WhatsName wrote:
| Given you are operating in Germany, where sending people cease-
| and-desist letters (Abmahnungen) purely for monetary gain, I
| would highly reccomend you to take good care of compliance topics
| like having a proper Impressum (mandatory contact page).
|
| Unless of course if you plan on never ever growing it into a
| business, then you might get away with Njalla and cloudflare as
| invisibility cloak...
| gab_ wrote:
| Yes, of course, but at the moment the site is only there to try
| out my idea and evaluate its potential.
| lukan wrote:
| Likely nothing will happen, but the way it looks, it is not
| obvious it is only for testing and you now posted it in the
| open and there seems a comercial intent. The law is quite
| clear there last time I checked, it needs an adress. (Even
| for uncomercial projects it seems advisable).
| farbklang wrote:
| That doesn't matter. Impressumspflicht means it's your
| pflicht to have an Impressum. You can also get an Abmahnung
| as an individual.
| WhatsName wrote:
| That is in principle correct, but since OP is using a
| throwaway and a know anonymous domain service. As long as
| he burns bridges after himself (eg. not making it into a
| adressable business later), there is not individual to
| deliver a letter to.
|
| Might work for a while, but a dangerous game to play...
| echoangle wrote:
| It's a bit more nuanced, you don't need a Impressum for
| purely personal websites without financial background.
| Evaluating a business idea would probably count as
| financial motive though, even if it isn't currently
| monetized.
| leobg wrote:
| The irony is... if you don't have an impressum, where should
| they send their abmahnung? :) In 2008, they could still look up
| your WHOIS. But this doesn't work anymore since GDPR.
| echoangle wrote:
| I can't positively claim that's the case but I wouldn't be
| surprised if your registrar has to give out your information
| if a valid legal claim comes in.
| davedx wrote:
| Excellent stuff! I've worked in this area. Have you considered
| applying ratios like the Altman z-score?
|
| I'm also curious how you back tested to get the final scoring.
| gab_ wrote:
| Thank you! Not yet, but it sounds interesting. The
| possibilities are endless. Currently, I am testing some methods
| from survival analysis.
|
| The data is very imbalanced; there are very few insolvent
| companies compared to solvent ones. Therefore, I work with
| synthetic data in my training dataset. To get the final score,
| I need to scale the predictions to achieve a heavily right-
| skewed distribution.
|
| Currently, I am using the method Platt scaling.
| costco wrote:
| It's cool that you were able to get the data even though it's not
| perfectly structured. Maybe you'll be the Dun & Bradstreet of
| Germany :)
| gab_ wrote:
| Thank you, maybe :)
| PeterStuer wrote:
| When I consulted in European financial services ICT (credit
| scoring and automated descisioning for asset based finance, AML
| etc.) the German data had an explicit regulatory restriction that
| data could only be obtained for one specific transaction and that
| that consent was not transferable. We obtained the data on
| company X explicitly for transaction Y. We could not pass on the
| data, nor simply reuse it for another purpose.
|
| Has that changed?
| gab_ wrote:
| I only work with public data and not personal data, even if it
| is publicly available. Anyone can look at the
| Unternehmensregister, the data that is uploaded there is not
| for a specific purpose. This data is there to inform e.g.
| customers, suppliers, creditors, employees etc. about the
| company's activities.
| drchaos wrote:
| This is pretty interesting, I think you should do two things
| right now:
|
| 1. add a message box stating that it is experimental and has only
| a very small set of companies right now
|
| 2. add an option to get notified when you have a more complete
| dataset (just use a Google form to collect email addresses)
|
| Reason: Searched for my company, no result, ok, we're too small.
| Searched for some DAX companies, no results either => site looks
| broken.
|
| Additional ideas:
|
| * Add information from insolvenzbekanntmachungen.de, it's a major
| PITA to find someone there * Provide a (paid) API so it can be
| integrated into shop systems etc.
|
| A Creditreform membership is quite expensive, probably worth it
| for larger shops, but for small enterprise your solution might
| come in handy.
| gab_ wrote:
| Thank you! I've just fixed the first point. For the second
| point, I have to look for an alternative to Google Forms. At
| the moment I am not yet obtaining any data from
| Insolvenzbekanntmachung. Insolvencies are also stored in the
| Unternehmensregister. However, this could certainly be
| integrated quite well.
___________________________________________________________________
(page generated 2024-12-12 23:01 UTC)