Post AvfKgakebWN5xpkNAO by Pi_rat@freesoftwareextremist.com
(DIR) More posts by Pi_rat@freesoftwareextremist.com
(DIR) Post #Avex0anVkEXgzz5UQq by Pi_rat@freesoftwareextremist.com
2025-06-30T15:02:10.953072Z
5 likes, 3 repeats
So I have been making image tagging program that will allow for keys to be bound to tags so at once an image can be tagged with multiple tags. This is my first serious project and I have been having lot of fun working on it. This is script prototype for it , that requires (-f) file with names of images in (-d) dir to be tagged with (-t) tag. It uses sqlite3 as db(and so will gui version). I still have to write query script for this and this can only tag. All of this files have to be in same directory, and it creates db file $HOME/.local/share/tagg/tags.db (--init)I was first going for separate db for every directory, but then friend suggested to "not do that". Retroactively adding this was not fun but it is done. -u is needed for every directory first time I use sxiv -o to get image names.Now I'm certainly doing lot of things wrong like having raw sql statements in functions and passing cursor to function. So I need criticism, and hard. Be harsh, make me cry.PS @p How do you manage your images since you always seem to have relevant images ready for occasion.cc @SuperDicq
(DIR) Post #AveyFTH46A3ZxnZOBU by SuperDicq@minidisc.tokyo
2025-06-30T15:16:02.623Z
4 likes, 0 repeats
@Pi_rat@freesoftwareextremist.com @p@fsebugoutzone.org I gotta say this is not the most ideal way of distributing software. Your instance has turned these filenames into a bunch of random hashes.
(DIR) Post #AveyR0qZcGIhGygLIW by laurel
2025-06-30T15:18:10.852150Z
2 likes, 0 repeats
@Pi_rat @SuperDicq @p >How do you manage your images since you always seem to have relevant images ready for occasion.Have you ever used hydrus? https://hydrusnetwork.github.io/hydrus/index.htmlI've been using it for years now.
(DIR) Post #Avez2w0o2aPPHWYD9E by SuperDicq@minidisc.tokyo
2025-06-30T15:24:57.581Z
3 likes, 0 repeats
@laurel@fsebugoutzone.org @Pi_rat@freesoftwareextremist.com @p@fsebugoutzone.org Hydrus is the most disgusting pile of bloated chanware that I've ever used with the worst UI known to man. I like the idea of the program and I tried to like it so much but I just absolutely despise it.I will most likely prefer something that is much more minimal for this like the less than 200 LOC thing that Pi_rat has written probably.
(DIR) Post #Avez2wpqynF3pqQyiO by Pi_rat@freesoftwareextremist.com
2025-06-30T15:25:01.028908Z
1 likes, 0 repeats
@SuperDicq @p :( yes I know but I dont want to host it anywhere except my repo, codeberg is good but I dont have my personal email to sign up. https://paste.debian.net/1383579/ (helpers.py) https://paste.debian.net/1383581/ (tagg.py this is main script)https://paste.debian.net/1383583/ (conf.py for now its really small but I am thinking storing directories that and some behaviors)https://paste.debian.net/1383585/ (init.sql)
(DIR) Post #AvezIM2zWaCAN8vSHQ by laurel
2025-06-30T15:27:49.271182Z
3 likes, 1 repeats
@SuperDicq @p @Pi_rat >Hydrus is the most disgusting pile of bloated chanware that I've ever used with the worst UI known to man. This is true, yes.
(DIR) Post #AvezJPZ9vXY3ZjTYwa by SuperDicq@minidisc.tokyo
2025-06-30T15:27:56.021Z
3 likes, 0 repeats
@Pi_rat@freesoftwareextremist.com @p@fsebugoutzone.org Anyway I read through the code and I see no major issues with it. It looks simple and clear to read. The database schema also seems properly normalized.I'm most curious about what the UI for this will look like because that will be what makes or breaks it for me. Tagging an image with multiple tags and then going to the next should require the least amount of clicks and button presses.
(DIR) Post #AvezSm57aHk00otHHs by p
2025-06-30T15:29:42.257068Z
7 likes, 0 repeats
@Pi_rat @SuperDicq > @p How do you manage your images since you always seem to have relevant images ready for occasion.I have a good memory (see attached) and also `find|grep` and also somewhat more recently I installed Tesseract and that lets me grep images.As far as the code, just from reading the SQL it looks like this is how you'd want to structure it. sqlite3 is good for this kind of thing; it's probably the easiest option. (Flat files and grep aren't to be underestimated, but you don't learn a lot from doing things that way.)incelligence.jpeg
(DIR) Post #AvezWpslOZacxT2y0m by hj@shigusegubu.club
2025-06-30T15:30:25.566303Z
3 likes, 0 repeats
@laurel @Pi_rat @p @SuperDicq and i still love it
(DIR) Post #AvezYi0vcX2v8Nl7qK by Pi_rat@freesoftwareextremist.com
2025-06-30T15:30:44.116410Z
2 likes, 0 repeats
@laurel @p @SuperDicq I have not personally used Hydrus or any other application, I tried DigiKam but gave up on quite early. This seemed like easy first project so I thought why not. Mainly I needed app that I can quickly tag, but thinking and writing, I have so many ideas for gui version like multiple OR and AND clauses...
(DIR) Post #Avezfwrun1clERzldA by SuperDicq@minidisc.tokyo
2025-06-30T15:32:01.849Z
2 likes, 0 repeats
@hj@shigusegubu.club @laurel@fsebugoutzone.org @Pi_rat@freesoftwareextremist.com @p@fsebugoutzone.org The thing I hate about it the most is that it really wants to take control of your filesystem and move copies of all your files into it's own folder structure.I don't want it to that. Just leave my fucking files where they are and don't make a copy. Just keep track of where my files in your database.
(DIR) Post #AvezvBPZ0DRqgAzy8e by Pi_rat@freesoftwareextremist.com
2025-06-30T15:34:49.062398Z
1 likes, 0 repeats
@p @SuperDicq I can relate, I can generally guess the time frame where image will be depending on surrounding images but I would still like to be able to quickly find them......
(DIR) Post #AvezyJ0RDocb6QjduS by hj@shigusegubu.club
2025-06-30T15:35:22.636974Z
1 likes, 0 repeats
@SuperDicq @p @laurel @Pi_rat that is extremely complicated, and i can tell this because i myself tried to implement that. You keep a filename: namespace if you need filenames and not the structure (you can also import structure as tags) man held up by ps3 games.jpg
(DIR) Post #Avf0agwUfvcYLPC6Bk by Pi_rat@freesoftwareextremist.com
2025-06-30T15:42:19.113916Z
1 likes, 0 repeats
@SuperDicq @p I am thinking more like sxiv with sidebar(and sxiv key binds so n saves whatever tags were added to this image and moves to next) for manipulating tags and keys they are bound to, Im also thinking since I will add "views" for advance querying, like (meme or pepe) and ( computer or wizard )
(DIR) Post #Avf0eMElbC7JHoUOga by p
2025-06-30T15:43:00.164085Z
1 likes, 0 repeats
@SuperDicq @Pi_rat Well, it is easier to view in the browser; tarball might be easier though.
(DIR) Post #Avf2hjGEeaYDTCuZqy by p
2025-06-30T16:06:01.324067Z
1 likes, 0 repeats
@SuperDicq @hj @laurel @Pi_rat > Just leave my fucking files where they are and don't make a copy.ABSOLUTELY YES THATWhen x11amp^Wxmms finally stopped building, the main thing I wanted from a music player was "I give it a directory and it plays the music in that directory."
(DIR) Post #Avf2nT0bzVaAnqL7vU by hj@shigusegubu.club
2025-06-30T16:07:02.181321Z
2 likes, 0 repeats
@p @Pi_rat @laurel @SuperDicq for music i agree, for images (memes/arts) etc - not really.
(DIR) Post #Avf3FNhJKdfoRFcefI by laurel
2025-06-30T16:12:06.228135Z
2 likes, 0 repeats
@p @SuperDicq @Pi_rat @hj What happens if you've got a couple of million images and you start moving things around? Does it go through all moved directories and recompute hashes?Things get even worse with architectures that support publishing/sharing your images and tags with others.
(DIR) Post #Avf3PDOGVatFMpP3fU by hj@shigusegubu.club
2025-06-30T16:13:51.421292Z
2 likes, 0 repeats
@laurel @Pi_rat @p @SuperDicq you don't even have to have millions, tens of thousands are already unwieldyScreenshot_20250630_191327.png
(DIR) Post #Avf43ggmznVloqMmNU by feld@friedcheese.us
2025-06-30T16:21:10.273654Z
1 likes, 0 repeats
@SuperDicq @hj @p @laurel @Pi_rat > move copies of all your files into it's own folder structure.someone just patch it to add a setting so it does this with hardlinks and then it doesn't matter as much
(DIR) Post #Avf4j90qaubZXHW5Ro by p
2025-06-30T16:28:41.269886Z
0 likes, 0 repeats
@hj @Pi_rat @SuperDicq @laurel I don't even like gtk doing thumbnails.
(DIR) Post #Avf4zHauSJdMmi50Ua by SuperDicq@minidisc.tokyo
2025-06-30T16:31:31.892Z
1 likes, 0 repeats
@p@fsebugoutzone.org @hj@shigusegubu.club @laurel@fsebugoutzone.org @Pi_rat@freesoftwareextremist.com I use VLC as my music player for this exact purpose "open directory" is all I need.
(DIR) Post #Avf537Se4aRGW02lea by p
2025-06-30T16:32:17.870721Z
1 likes, 0 repeats
@laurel @Pi_rat @SuperDicq @hj > Does it go through all moved directories and recompute hashes?Does what?
(DIR) Post #Avf5WiDsTLx0vaFNtg by SuperDicq@minidisc.tokyo
2025-06-30T16:37:23.683Z
0 likes, 0 repeats
@p@fsebugoutzone.org @laurel@fsebugoutzone.org @Pi_rat@freesoftwareextremist.com @hj@shigusegubu.club Why do you even need hashes? The code that was just posted simply has a database that stores where things are. Maybe hashes could be useful for things like deduplication but that's outside of what this program is intended to do.
(DIR) Post #Avf5gO2cMVSBEPareC by p
2025-06-30T16:39:23.670198Z
0 likes, 0 repeats
@hj @laurel @Pi_rat @SuperDicq I'm not quite at "tens of thousands" yet.memes.png
(DIR) Post #Avf5ioHLmYMjBXdkR6 by laurel
2025-06-30T16:39:49.948370Z
1 likes, 0 repeats
@p @Pi_rat @SuperDicq @hj How is it going to know it's the same files. If there are any discrepancies between its internal representation and the files on disc, it will have to recompute image hashes.
(DIR) Post #Avf5uBrCSWwNUspph2 by p
2025-06-30T16:41:53.348840Z
0 likes, 0 repeats
@feld @SuperDicq @Pi_rat @hj @laurel > and then it doesn't matter as muchSome filesystems don't support hard links.
(DIR) Post #Avf60eYGgb42EI3sMC by feld@friedcheese.us
2025-06-30T16:43:01.000057Z
1 likes, 0 repeats
@p @Pi_rat @laurel @SuperDicq @hj which desktop OS filesystem does not support hardlinks?
(DIR) Post #Avf69CZg4f6m48nU8W by laurel
2025-06-30T16:44:36.131422Z
1 likes, 0 repeats
@SuperDicq @hj @p @Pi_rat Because I want something more robust.
(DIR) Post #Avf6ETexykdWVEmpHM by p
2025-06-30T16:45:33.332759Z
1 likes, 0 repeats
@SuperDicq @hj @laurel @Pi_rat Yeah, I use cmus. `:add sound/` and there it all is. I don't know how easy it is to remap keys in VLC, but I actually built cmus on my n900 so that I could push keyboard buttons without looking at the device and crashing my car. (I still use the uConsole like this.)
(DIR) Post #Avf6NFPWKHW4urkAjI by p
2025-06-30T16:47:08.437060Z
0 likes, 0 repeats
@SuperDicq @hj @laurel @Pi_rat Yeah; I think that's how you want to do it. Just tag paths.
(DIR) Post #Avf6SxXvesaDFSOsz2 by p
2025-06-30T16:48:10.373873Z
1 likes, 0 repeats
@laurel @Pi_rat @SuperDicq @hj I am not sure what thing is under discussion. Does what make hashes?
(DIR) Post #Avf6Sxiv01NDnXXf72 by Pi_rat@freesoftwareextremist.com
2025-06-30T16:48:09.323191Z
2 likes, 0 repeats
@laurel @p @SuperDicq @hj I could add hashes as config option...
(DIR) Post #Avf6VsNvIx7u6q6x16 by pwm@darkdork.dev
2025-06-30T16:48:39.964356Z
2 likes, 1 repeats
@Pi_rat @p @SuperDicq Hopefully some of this is helpful to you :)init.sql:If you are not writing portable sql schemas, then it is not necessary to write "integer not null primary key autoincrement." autoincrement incurs a performance penalty and should be omitted for integer primary keys1.Additionally, you may safely use datatype TEXT to store the names of your tags, as under the hood, sqlite will treat these the same. In fact, if you wish, you do not have to specify a datatype at all2tagg.py:The first, biggest thing to consider is the logic related to applying tags to images. Take a look at the non-standard (but very useful) upsert clause3. Present in both sqlite and postgresql, it allows you to handle a uniqueness constraint violation inside the query, without using program logic.For instance this would let you write something like:insert into image_tags values (:id, :tag) on conflict do nothingThis turns duplicate tag application into a no-op, saving you some trouble and making your code nicer.Finally, in the sqlite implementation, cursor.executemany() is functionally equivalent to manually iterating over a loop yourself, and in this case would not require building a list of dictionaries manually to iterate over, if you are concerned with resource usage/speed.helpers.pyinit_db:It is not necessary to manually parse the database initialization script and execute the commands one at a time. Instead use cursor.executescript() with the loaded contents of the file as a parameter.add_images:It may be interesting to take an alternative approach to loading all images known in the directory, and instead repeatedly ask the database if a given file is already present. e.g.select 1 from images where dir_id = :dir_id and image = :imageThis will either return 1 or you will get an empty result set from your cursor. This will probably be faster and will definitely use less memory.Also, in general prefer named parameters over positional ones (:dir_id, :image) is better than (?, ?). This is stylistic and a personal choice so buyer beware.create_insert_tuple:This whole method can probably be gotten rid of by switching your logic to using the upsert syntax to apply tags.General NotesWhere are you actually committing to the database? AFAICT you're never writing anything to disk unless somehow executemany is autocommitting???In general, shift all logic related to processing data into the database. It is very good at doing those tasks very fast. Brushing up on sql will go a long way here towards simplifying your python code as much as possible. The sqlite docs are wonderful.Try to avoid doing data processing in pure python, it is very not fast at that.Autoincrement in SQLiteNotes on DatatypesUpsert
(DIR) Post #Avf7GDYYvBit5oBVyK by p
2025-06-30T16:57:04.503323Z
0 likes, 0 repeats
@feld @Pi_rat @SuperDicq @hj @laurel > which desktop OS filesystem does not support hardlinks?When you buy a USB stick, what is the filesystem format?Hard links also can't cross FS boundaries. People can run into this without knowing: Armbian, for example, likes to mount a tmpfs over some directories and then sync so as not to wear down your SD card.If people remove the images, what are you going to do? Are you going to track it to make sure it's the only link? What about writes in place, like tweaking the EXIF data?It's better to just deal with files where they are.
(DIR) Post #Avf7KuVn9lPjTU7uYS by laurel
2025-06-30T16:57:55.391936Z
2 likes, 0 repeats
@Pi_rat @SuperDicq @hj @p I like the simple approach that you have chosen. Focus on having a working prototype and perhaps later you could add hashes as a fallback.
(DIR) Post #Avf9rfxqSriJox037I by feld@friedcheese.us
2025-06-30T17:26:14.037735Z
1 likes, 0 repeats
@p @Pi_rat @laurel @SuperDicq @hj > When you buy a USB stick, what is the filesystem format?FAT/exFAT is not a desktop OS filesystem. Neither is NFS or anything else you can think of that doesn't support hardlinks.> Hard links also can't cross FS boundaries. okay, but that's why you keep your Hydrus storage next to where you normally store these files... on the same filesystem...> If people remove the images, what are you going to do? Are you going to track it to make sure it's the only link?yeah it's actually quite easy I do it all the time> What about writes in place, like tweaking the EXIF data?I do this all the time with my music library and it doesn't break the hardlink and leave me with two different files?like I literally suggested that this be an OPTION and you came in here firing in all directions like "ACKCHYUALLY YOU CAN'T DO THAT BCAUSE I CAN INVENT AN INCOMPATIBLE USE CASE"get a hobby and stop always trying to be the smartest person in the room, it's not a good look
(DIR) Post #AvfAtT2XaUSgGoc91k by p
2025-06-30T17:37:47.016761Z
0 likes, 0 repeats
@feld @Pi_rat @SuperDicq @hj @laurel > FAT/exFAT is not a desktop OS filesystem.Either we're talking about people that don't know whether their USB stick is FAT or not, or we're talking about people that might not be running a "desktop OS filesystem". Samba isn't a super rare thing, either.> like I literally suggested that this be an OPTION and you came in here firing in all directions like I just said it was not always going to work, you argued, I replied. I don't know what you want from me. If you don't want to argue about it, don't argue about it.> get a hobby and stop always trying to be the smartest person in the room, it's not a good look You jumped in the thread to suggest hard links; I just said it's not always going to work. I didn't post caricatures at you, speculate that you had some motivation like "trying to be the smartest person in the room": you jumped in, suggested that someone else patch this software to include your idea, I have never even heard of this program before this thread, I just think hard links are a bad idea, and now you're acting like I cut you off in traffic. Chill, dude.firing-in-all-directions.png
(DIR) Post #AvfBKfmwIVMrcxI0QK by p
2025-06-30T17:42:42.023344Z
0 likes, 0 repeats
@RedTechEngineer @Pi_rat @SuperDicq @feld @hj @laurel I think most of mine are either FAT because they are for conveying files between good computers and Windows machines or they're ext.
(DIR) Post #AvfDq029H9VlIoT86q by snacks@netzsphaere.xyz
2025-06-30T18:10:39.643143Z
1 likes, 0 repeats
@p @Pi_rat @laurel @SuperDicq @hj even my mp3 player can do that
(DIR) Post #AvfKQrV0ZLWpGxvbuq by Pi_rat@freesoftwareextremist.com
2025-06-30T19:24:37.304749Z
1 likes, 0 repeats
@pwm @p @SuperDicq init.sqlIf you are not writing portable sql schemas, then it is not necessary to write "integer not null primary key autoincrement." autoincrement incurs a performance penalty and should be omitted for integer primary keys1.I had read about autoincrement being slow but I thought alternative to it would be me generating unique ids myself. I will fix code to use rowids.Additionally, you may safely use datatype TEXT to store the names of your tags, as under the hood, sqlite will treat these the same. I was going for VARCHAR(255) (max file name length in ext4) but then friend pointed that database should not be doing those checks, my reasoning was if I set limit on text it will help db have standard row size. I also later came across that sqlite does not actually apply this constraint, and it is my error that I did not change it.tagg.pyThe first, biggest thing to consider is the logic related...without using program logic.This did come up but I took a break and later I did not have any motivation. I forced myself for past few days but this is becoming fun again so I will try to put much more effort. And as you said it is non standard so it felt weird, I was originally going about try except and logging duplicates.Finally, in the sqlite implementation, cursor.executemany() is functionally equivalent to manually iterating over a loop yourself, and in this case would not require building a list of dictionaries manually to iterate over, if you are concerned with resource usage/speed.I thought it would be optimized. I will refactor around this. I was going for dicts since qmark style is going to be deprecated (and as further along you have said, I agree dicts are cleaner)helpers.pyinit_db...I had stong feeling that this could be done in better way, I was split between having whole schema in init function(but that did not sit right either). I did come across executescript in docs but in excitement of working of tagging logic did not gave it much attention. Will fix this.add_images:It may be interesting to take an alternative approach to loading all images known in the directory, and instead repeatedly ask the database if a given file is already present. e.g.select 1 from images where dir_id = :dir_id and image = :imageThis will either return 1 or you will get an empty result set from your cursor. This will probably be faster and will definitely use less memory.As you have pointed out, Im thinking of upsert syntax to just ignore duplicate inserts. Originally I was trying to figure out a way to have "SELECT id FROM images WHERE image IN ?" and I tried every combination of qmark and dicts but it was no dice but could be done with string formatting, but I rather not use it since Im doing parametrized queries everywhere else, that would be bad.create_insert_tuple:This whole method can probably be gotten rid of by switching your logic to using the upsert syntax to apply tags.It would be extremly cool and I will try hard for this, this one function was what kept me debugging all of today, and problem was I was passing "directory" insted of "dir_id" in tagg.py. There was also problem with if statement where I errantly assumed empty list == False but it was not so. I also dont like setting image_id for every image and allocating it is inefficient. General NotesWhere are you actually committing to the database? AFAICT you're never writing anything to disk unless somehow executemany is autocommitting???Ah yes I have autocommit turned on as long as there are no exceptions, I will add explicit commit. I was not even closing db mistaking that context manager will handle closing it for me. Just learned today that I have to close cursor too.In general, shift all logic related to processing data into the database. It is very good at doing those tasks very fast. Brushing up on sql will go a long way here towards simplifying your python code as much as possible. The sqlite docs are wonderful.Try to avoid doing data processing in pure python, it is very not fast at that.Is it correct to use "sqlite3" directly and using functions and passing cursor around them, writing sql statements inside function. Im not sure if there is standard way to do this but personally it seems off. Since python is OOP shouldnt I be creating classes and adding methods? i.e (cursor.select_image(image_id) == cursor.execute("SELECT image FROM images WHERE id = ?", (image_id,)) My absolute first draft of it compared to this is such a mess, I was not using argparse and schema was HORROR. I was certain I would have to make a seprate table for every tag and it would hold image ids. Friend pointed me to correct many to many relation table and that was such a bombshell moment. I learned sql end of last year and at that time I did make correct many to many schemas but since then I have not kept up with it.I LOVE sql(ite) and will read all the Docs you have linked with fervour. Thank you VERY much for giving such detailed review. I will follow you up when I implement queries.(apolocheese for typos and broken sentences)
(DIR) Post #AvfKgakebWN5xpkNAO by Pi_rat@freesoftwareextremist.com
2025-06-30T19:27:29.185636Z
1 likes, 0 repeats
@pwm @p @SuperDicq @pwm @p @SuperDicq init.sqlIf you are not writing portable sql schemas, then it is not necessary to write "integer not null primary key autoincrement." autoincrement incurs a performance penalty and should be omitted for integer primary keys1.I had read about autoincrement being slow but I thought alternative to it would be me generating unique ids myself. I will fix code to use rowids.Additionally, you may safely use datatype TEXT to store the names of your tags, as under the hood, sqlite will treat these the same. I was going for VARCHAR(255) (max file name length in ext4) but then friend pointed that database should not be doing those checks, my reasoning was if I set limit on text it will help db have standard row size. I also later came across that sqlite does not actually apply this constraint, and it is my error that I did not change it.tagg.pyThe first, biggest thing to consider is the logic related...without using program logic.This did come up but I took a break and later I did not have any motivation. I forced myself for past few days but this is becoming fun again so I will try to put much more effort. And as you said it is non standard so it felt weird, I was originally going about try except and logging duplicates.Finally, in the sqlite implementation, cursor.executemany() is functionally equivalent to manually iterating over a loop yourself, and in this case would not require building a list of dictionaries manually to iterate over, if you are concerned with resource usage/speed.I thought it would be optimized. I will refactor around this. I was going for dicts since qmark style is going to be deprecated (and as furthur along you have said, I agree dicts are cleaner)helpers.pyinit_db...I had stong feeling that this could be done in better way, I was split between having whole schema in init function(but that did not sit right either). I did come across executescript in docs but in excitement of working of tagging logic did not gave it much attention. Will fix this.add_images:It may be interesting to take an alternative approach to loading all images known in the directory, and instead repeatedly ask the database if a given file is already present. e.g.select 1 from images where dir_id = :dir_id and image = :imageThis will either return 1 or you will get an empty result set from your cursor. This will probably be faster and will definitely use less memory.As you have pointed out, Im thinking of upsert syntax to just ignore duplicate inserts. Originally I was trying to figure out a way to have "SELECT id FROM images WHERE image IN ?" and I tried every combination of qmark and dicts but it was no dice but could be done with string formatting, but I rather not use it since Im doing parametrized queries everywhere else, that would be bad.create_insert_tuple:This whole method can probably be gotten rid of by switching your logic to using the upsert syntax to apply tags.It would be extremly cool and I will try hard for this, this one function was what kept me debugging all of today, and problem was I was passing "directory" insted of "dir_id" in tagg.py. There was also problem with if statement where I errantly assumed empty list == False but it was not so. I also dont like setting image_id for every image and allocating it is inefficient. General NotesWhere are you actually committing to the database? AFAICT you're never writing anything to disk unless somehow executemany is autocommitting???Ah yes I have autocommit turned on as long as there are no exceptions, I will add explicit commit. I was not even closing db mistaking that context manager will handle closing it for me. Just learned today that I have to close cursor too.In general, shift all logic related to processing data into the database. It is very good at doing those tasks very fast. Brushing up on sql will go a long way here towards simplifying your python code as much as possible. The sqlite docs are wonderful.Try to avoid doing data processing in pure python, it is very not fast at that.Is it correct to use "sqlite3" directly and using functions and passing cursor around them, writing sql statements inside function. Im not sure if there is standard way to do this but personally it seems off. Since python is OOP shouldnt I be creating classes and adding methods? i.e (cursor.select_image(image_id) == cursor.execute("SELECT image FROM images WHERE id = ?", (image_id,)) My absolute first draft of it compared to this is such a mess, I was not using argparse and schema was HORROR. I was certain I would have to make a seprate table for every tag and it would hold image ids. Friend pointed me to correct many to many relation table and that was such a bombshell moment. I learned sql end of last year and at that time I did make correct many to many schemas but since then I have not kept up with it.I LOVE sql(ite) and will read all the Docs you have linked with fervour. Thank you VERY much for giving such detailed review. I will follow you up when I implement queries.(apolocheese for typos and broken sentences, I deleted post because it had formatting issues, hope they are fixed this time)
(DIR) Post #AvfKqDPMFX40SfNIMC by Pi_rat@freesoftwareextremist.com
2025-06-30T19:29:13.444222Z
0 likes, 0 repeats
Yeah they are not (>~<)
(DIR) Post #AvfLq59N3HYknvOh0a by p
2025-06-30T19:40:25.619358Z
1 likes, 0 repeats
@Pi_rat @pwm @SuperDicq > but then friend pointed that database should not be doing those checksYou definitely want the DB to enforce as much as you can, but filename length, maybe not. sqlite3 data types aren't enforced, that is correct, but they do usually act as hints to your ORM or library.> Is it correct to use "sqlite3" directly and using functions and passing cursor around them, writing sql statements inside function.For a small enough program, that's fine. But you should probably wrap it in an object so you don't have to deal with details of the query or the DB in other functions, keep the logic clean so functions are topical and legible. I don't know how Python deals with iterators, maybe better to model a cursor as a thing that yields rows to whoever asks.
(DIR) Post #AvfNhGep6Am9nO28x6 by Pi_rat@freesoftwareextremist.com
2025-06-30T20:01:13.340643Z
3 likes, 0 repeats
@p @pwm @SuperDicq (Idk why but markdown does not give paragraph)You definitely want the DB to enforce as much as you can, but filename length, maybe not. sqlite3 data types aren't enforced, that is correct, but they do usually act as hints to your ORM or library.:yousoro: For a small enough program, that's fine. But you should probably wrap it in an object so you don't have to deal with details of the query or the DB in other functions, keep the logic clean so functions are topical and legible.I will look into this, Im not sure how I would go about it at all since I have very surface level understanding of OOP.I don't know how Python deals with iterators, maybe better to model a cursor as a thing that yields rows to whoever asks.I'm passing it around to execute relevant queries on it.Ill go sleep now~~ oyasumi :rms_suya:
(DIR) Post #AvfX7qvJmfWLOCV5ai by pwm@darkdork.dev
2025-06-30T21:46:51.422885Z
2 likes, 0 repeats
@Pi_rat @p @SuperDicq I had read about autoincrement being slow but I thought alternative to it would be me generating unique ids myself. I will fix code to use rowids.I may have not made myself clear here, you need not fuss with rowids yoursefl, you may simply use integer primary keyI thought it would be optimized. I will refactor around this. I was going for dicts since qmark style is going to be deprecated (and as furthur along you have said, I agree dicts are cleaner)This is a good intuition! In other database drivers (psycopg2 for example), executemany is optimized.As you have pointed out, Im thinking of upsert syntax to just ignore duplicate inserts. Originally I was trying to figure out a way to have "SELECT id FROM images WHERE image IN ?" and I tried every combination of qmark and dicts but it was no dice but could be done with string formatting, but I rather not use it since Im doing parametrized queries everywhere else, that would be bad.Good catch here, I don't know how I missed using upsert here but you're absolutely right, this is a good use case for it.Ah yes I have autocommit turned on as long as there are no exceptions, I will add explicit commit. I was not even closing db mistaking that context manager will handle closing it for me. Just learned today that I have to close cursor too.When you use the actual python context manager syntax,with sqlite3.connect("yourdb.sqlite") as db: with db.cursor() as cursor: do your query etc etc ... here the cursor is now closed db.commit()here the database connection is now closedIn this case, the python context manager will automatically call the appropriate close function as you fall out of the indented block. If you do it in the imperative style you have been doing you must call close() on each cursor and database connection yourself. In a purely technical sense, all that is being cleaned up when the program exits, and this is not a long-running server process. But it's good practice to use either of these two methods to keep things tidy yourself. Many other objects use this protocol iirc you used it when opening and reading a file.Is it correct to use "sqlite3" directly and using functions and passing cursor around them, writing sql statements inside function. Im not sure if there is standard way to do this but personally it seems off.You may pass the connection or cursor object in a function call. However, things get hairier if you start to multithread or multiprocess. sqlite3 module objects are not threadsafe and you will get programming errors if you try to use e.g. a cursor outside of the thread it was created in. This is not really a relevant concern here, though.As for code style, it might be slightly "cleaner" to pass the database object to functions that use the database, create a cursor for the scope of the function, and close the cursor before you return. This will still alow the callee function control over, e.g. committing to the database in case of a DML statement.Since python is OOP shouldnt I be creating classes and adding methods? i.e (cursor.select_image(image_id) == cursor.execute("SELECT image FROM images WHERE id = ?", (image_id,))It is bad practice to attach methods to objects after the object is instantiated, which is what you would be doing were you to do as you stated in this example.I wouldn't get too caught up in trying to follow any particular paradigm here, your codebase is small and it's a for-fun project. Short of using an ORM to abstract away handling the database connection, carefully passing a connection object around is a-ok, and what the ORNM is doing under the hood, anyway.I LOVE sql(ite) and will read all the Docs you have linked with fervour. Thank you VERY much for giving such detailed review. I will follow you up when I implement queries.You're welcome! SQLite is one of my favorite pieces of software, too, if you couldn't tell! I'm a sql and python guy so I'm happy to help out and had fun reading through this and providing commentary today.
(DIR) Post #AvfbSTrd75gDllrodU by laurel
2025-06-30T22:35:26.517956Z
3 likes, 0 repeats
@Pi_rat @p @SuperDicq @pwm >I will look into this, Im not sure how I would go about it at all since I have very surface level understanding of OOP.I think he meant to make a class specifically for database operations. The database connection/initialization code will go into the class init and the cursor will be a class variable only accessible by the member functions. Then in your main routine you instantiate the class, let's say an object called db which you then use to call the methods providing only the information necessary. For instance db.addtag(path, newTagString), db.deletePath(path), etc.It's been a while since I wrote any Python btw, I only use jupyter-notebook with some forecasting and data analysis libs nowadays.
(DIR) Post #AvffdfdQ5cpHHIoNYu by pwm@darkdork.dev
2025-06-30T23:22:15.050962Z
3 likes, 0 repeats
@laurel @Pi_rat @p @SuperDicq Abstracting your database connection into a class is usually not strictly necessary unless you need the abstraction. For instance, if you were trying to support many database connectors, not just one. You would wind up needlessly proxying function calls and little else. Unless you are going to be implementing novel or QoL features for yourself it's not really necessary in at least this specific case. > db.addtag(path, newTagString), db.deletePath(path)There's a separate approach where Path would be a model that itself interfaces with the db, eg Path.add(path) or Path(path).delete(), which is a pattern you might see if you were using a ORM like sqlalchemy, or the django ORM.There's more than one way to skin a cat but his approach and his scope are well matched, there is neither the requirement, nor the need to inject OOPisms when the imperative code is clean and succint.
(DIR) Post #Avg8I7YgUGIaBJfnRg by p
2025-07-01T04:43:20.124140Z
1 likes, 0 repeats
@Pi_rat @SuperDicq @pwm > (Idk why but markdown does not give paragraph)Probably you've got to double-space.> I will look into this, Im not sure how I would go about it at all since I have very surface level understanding of OOP.OOP is a convenient way to model this, but essentially it's just keeping things organized by topic. Separation of concerns makes things much more legible is all.> I'm passing it around to execute relevant queries on it.Ah, okay. So what I was talking about was you run a query and then encapsulate the results in an iterator or something so the function that uses the results just gets fed this iterator that sends back rows and then it closes its own cursor when it gets to the last one and starts returning nils, whatever those are in Python. (I think it's `null`? I've written basically no Python from scratch, just editing Python. It's basically the same language as all the 90s scripting languages.)If it's a singleton, it's not so bad to just make it globally accessible instead of passing it around. (For a good time, put it in a memoized function that opens the DB on first call; this is a little nicer than making it a global variable. Generally in a bigger project you'll end up including or writing something to manage a connection pool, but the code is still small and it's sqlite3 so no pool anyway. But don't get carried away writing code to manage things that you don't need managed yet, you know what I mean.)
(DIR) Post #Avg9kdjMZ5OQ1ZpZEO by p
2025-07-01T04:59:41.671513Z
0 likes, 0 repeats
@munir @RedTechEngineer @Pi_rat @SuperDicq @feld @hj @laurel How so?
(DIR) Post #AvgCb1HURx4q20tj5E by Pi_rat@freesoftwareextremist.com
2025-07-01T05:31:33.010610Z
0 likes, 0 repeats
@pwm @p @SuperDicq I may have not made myself clear here, you need not fuss with rowids yoursefl, you may simply use integer primary keyYou were clear, I just phrased my sentence weird. In this case, the python context manager will automatically call the appropriate close function as you fall out of the indented block. ... This will still alow the callee function control over, e.g. committing to the database in case of a DML statement.Noted, will pass connection. (personallythis will also look better than 2 levels of indentations)Short of using an ORM to abstract away handling the database connection, carefully passing a connection object around is a-ok, and what the ORNM is doing under the hood, anyway.I was looking into sqlalchemy but and combined with everything else, my attention was spread to thin so, I left it. Nice to know it's not wrong to pass connectionYou're welcome! SQLite is one of my favorite pieces of software, too, if you couldn't tell! I'm a sql and python guy so I'm happy to help out and had fun reading through this and providing commentary today.Eyy :backtogab: !! I forgot to ask yesterday, how should I be testing this, for now I manually open up interpreter and db and test, but this gets very tedious. My guess is unittests + a test.db which would return expected row(s). Wanted to write them but procrastinated on this and now I shudder at magnitude of what I would have to cover, probably wont be as bad once I get started and write a few.
(DIR) Post #AvgCsRd7jq6CfrKFkG by Pi_rat@freesoftwareextremist.com
2025-07-01T05:34:42.007350Z
1 likes, 0 repeats
@pwm @p @SuperDicq (again, formatting error)I may have not made myself clear here, you need not fuss with rowids yoursefl, you may simply use integer primary keyYou were clear, I just phrased my sentence weird. In this case, the python context manager will automatically call the appropriate close function as you fall out of the indented block. ... This will still alow the callee function control over, e.g. committing to the database in case of a DML statement.Noted, will pass connection. (personallythis will also look better than 2 levels of indentations)Short of using an ORM to abstract away handling the database connection, carefully passing a connection object around is a-ok, and what the ORNM is doing under the hood, anyway.I was looking into sqlalchemy but and combined with everything else, my attention was spread to thin so, I left it. Nice to know it's not wrong to pass connectionYou're welcome! SQLite is one of my favorite pieces of software, too, if you couldn't tell! I'm a sql and python guy so I'm happy to help out and had fun reading through this and providing commentary today.Eyy!! :backtogab:I forgot to ask yesterday, how should I be testing this, for now I manually open up interpreter and db and test, but this gets very tedious. My guess is unittests + a test.db which would return expected row(s). Wanted to write them but procrastinated on this and now I shudder at magnitude of what I would have to cover, probably wont be as bad once I get started and write a few.
(DIR) Post #AvgEl1KbJm28zWFWN6 by Pi_rat@freesoftwareextremist.com
2025-07-01T05:55:45.833182Z
1 likes, 0 repeats
@p @pwm @SuperDicq Probably you've got to double-space.Hope this werks, I was using new line and single spaceOOP is a convenient way to model this, but essentially it's just keeping things organized by topic...I think it's null?...small and it's sqlite3 so no pool anyway.Noted. Its called "None" around this parts. And yes I dont think this will be multi threaded, not even the GUI version. But don't get carried away writing code to manage things that you don't need managed yet, you know what I mean.🫡