Add A Text String for Genres in the Standard export

DiscussieRecommend Site Improvements

Sluit je aan bij LibraryThing om te posten.

Add A Text String for Genres in the Standard export

1JoeB1934
jan 19, 2022, 2:26 pm

I usually take the existing excel export file and strip out the fields that are essential to my analysis of my library. One field I wish was there would bring a string of the genres for the book like there is for Tags. This could be added to the standard export, but I have a different idea.

There are fields in the complete export which to me are personal essentials that are what I have used to describe each book. This would include, for me:
Title, author, my rating, Date Read, Tags, Collections, ISBNs, and genres. Others might have a different rule, but I could easily create my essentials list if a genre string was added to the big export.

2bnielsen
jan 20, 2022, 7:53 am

>1 JoeB1934: Could you give an example of what you want?

I read >1 JoeB1934: as a wish for an extra column in the tsv file (and the similar export files) where say one of Agatha Christies novels would get:

Fiction;General Fiction;Historical fiction;Mystery

I ask because that seems like what you ask for at the end of the post, but in the first part you write "I have a different idea". But you don't describe what the different idea is? (Or maybe I've gotten too little or too much coffee during the day and can't see it.)

BTW:
The genre names are taken from:
https://www.librarything.com/catalog_bottom.php?specialpage=genre

3JoeB1934
jan 20, 2022, 8:47 am

>2 bnielsen: Yes, your example is exactly what I had in mind.
The different idea was to have a new export which only include the items I listed as essentials. However, I realize that would be a new export file structure and that it is unrealistic to request a whole new file structure.
Just the example text string you have would be terrific for my purposes.

4bnielsen
Bewerkt: jan 20, 2022, 10:41 am

>3 JoeB1934: I took the example from my own database, so I agree. I've created a static list of the books in each genre in my own library.

I.e. listing Genre:Poetry in a style with mostly Book_Id and then copy/pasting into a wiki page. Repeat for each genre.

I've also been experimenting to see if I could get a filtered export to show only a given genre, but I guess that's not implemented.

5lorax
jan 20, 2022, 10:55 am

Yeah, they probably didn't add Genre to the export immediately because they were still tweaking the taxonomy, but it's been stable for awhile now. No reason they shouldn't add it.

6bnielsen
Bewerkt: jan 29, 2022, 5:49 am

Ah, this led me into exploring things a bit more.
Tinycat has some of this information:

wget -q https://www.librarycat.org/lib/bnielsen/item/211493689 -O - | sed -e 's/</\n/g' | grep '/search/genre/' | cut -f2 -d\>

gives this:

Technology
Nonfiction

So I can get the genre information for a single book. That's a good start.

On the Tinycat page the words Technology and Nonfiction seems to be links to a genre search. I.e.
https://www.librarycat.org/lib/bnielsen/search/genre/20275895/Nonfiction
but this gives "No results". I don't know if this is a bug or a missing feature.

The same Tinycat page, i.e. https://www.librarycat.org/lib/bnielsen/item/211493689 also gives a Description that I don't think is present in the Library export file.

On a similar note the LT app (android) gives a "Member tags" that I also can't find anywhere else.

(It seems it will be a stormy day here today, so I might explore this a bit more.)

7JoeB1934
jan 29, 2022, 9:55 am

>6 bnielsen: This is intriguing and I will have to explore Tinycat, which I haven't even pursued as a subject in my LT application. Thanks for the tip.

8bnielsen
Bewerkt: jan 29, 2022, 1:32 pm

>7 JoeB1934: And thanks for starting me on this.

I think I found a bug in TinyCat.
On https://www.librarycat.org/lib/bnielsen/item/211493689
under Genres the word Technology links to https://www.librarycat.org/lib/bnielsen/search/genre/20275895/Nonfiction
which gives "No results".

(I'll bug that up in the Bug Collectors group if it hasn't been described yet)

However https://www.librarycat.org/lib/bnielsen/search/genre/63 gives the desired result and even better https://www.librarycat.org/lib/bnielsen/search/genre/63?page=2 etc allows you to page through all your books in a given genre.

So to sum it up:
https://www.librarything.com/catalog_bottom.php?specialpage=genre

gives you the genres and the mouseover text gives you the numerical id for each genre. I.e. Technology = 63.
And using urls like https://www.librarycat.org/lib/bnielsen/search/genre/63?page=1 , https://www.librarycat.org/lib/bnielsen/search/genre/63?page=2 , https://www.librarycat.org/lib/bnielsen/search/genre/63?page=3 will list the books in the genre.

Sports & Leisure is genre 64
wget -q https://www.librarycat.org/lib/bnielsen/search/genre/64?page=1 -O - | sed -e 's/</\n</g' | grep '^<a href="/lib/bnielsen/item/[0-9]\+">$' | grep -o '[0-9]\+'

gives

161438723
23291513
162628342
44828702
113265960
22689342
104227603
116805907

which are the Book Id's for the books in my library that are in the genre Sports & Leisure.

9bnielsen
Bewerkt: jan 31, 2022, 8:58 am

Just to add a bit to >8 bnielsen: the interesting in urls like this is that you can write a script to go through them all and get the list.

https://www.librarycat.org/lib/bnielsen/search/genre/63?page=1
https://www.librarycat.org/lib/bnielsen/search/genre/63?page=2
https://www.librarycat.org/lib/bnielsen/search/genre/63?page=3

In the LT web interface you can also click through the pages, but that's not as easy to write a script for.

ETA: the bug report is here: https://www.librarything.com/topic/339162

10JoeB1934
jan 31, 2022, 4:49 pm

>9 bnielsen: I am very appreciate of your work on this issue.

I certainly see where you are coming from. When I clicked on your first URl, seeing 'The Mythical Man-Month' brought back decades ago when I was actually a software developer. Starting in 1960 with Fortran and punched cards on mainframes. Then, into PL/I and later with PowerBuilder. If I was somewhat younger, I would probably seriously pursue your ideas, however, at 87 about all I can do is apply Excel to my data analysis.

I think LT is a fantastic product, but I have been hoping that they could make what I consider a minor programming issue of displaying in their export fields which are easier to deal with. All to have one line per book and numbers instead of stars. I have requested more complex changes, like displaying certain retrievals in a library. I am very pessimistic about that request being scheduled when a much simpler change to the export file is probably not forthcoming.

Another request I have which should be simple enough is to allow something other than check marks in a tagmash retrieval. I really need to sort such retrievals in order to bring to the top those books in my library.
Enough of my complaining, thanks for your efforts on my behalf.

11bnielsen
Bewerkt: feb 1, 2022, 4:15 am

>10 JoeB1934: Thanks for the praise. I was born in 1960, so I wasn't exposed to Fortran until much later. I do remember one fortran book with a toy example of finding a knight's tour on a chess board though. It found a tour in a fraction of a second. Later I came across the same example in a Pascal book where the solution was rather elegant and used recursion and would still probably still be running now if I hadn't stopped the program and nobody had scrapped the machine :-)

I'll do a bit of experimenting with tagmashes to see if I can come up with something useful.

So far I've gotten a nice solution to my use of genres, so thanks for giving me the reason for digging into that.

BTW in >7 JoeB1934: you mention that you hadn't used Tinycat prior to this. How about the LT App?

12bnielsen
feb 1, 2022, 5:31 am

Digging a bit into Tagmashing:
https://www.librarything.com/tag/pirates,+zombies
is a tagmash that includes
https://www.librarything.com/work/2250245

So the result of a tagmash is a set of works. This makes sense, since my own tags belong to my books. But the "Other people's tags" belong to works.
And so the "Member tags" I mentioned in >6 bnielsen: are actually found if I look at "Tags" at the work level.
I also need to choose to see All tags since only then does Pirates and Zombies show up in the list.

*check the tags; -MR; 0-WSH-Digi; 0-WSH-Thrift; 1 yr; 100 Books 12; 6; 8; 817.008; 2011; 2012; 2013; 2014; 2015; 2015/11/11; 2016; 2018; 2019; 2020_GR_Import; 2021-0125-C; 21st century; 4/19; 5-stars; 6/13; 75 Books Challenge for 2013 wishlist; 7s; ?; @childrenYA; donate box 2; a0611; acquired 2016; adult; adult picture book; adult picture books; adult storybook; adults; All My Friends Are Dead series; all-books; Amazon; AmazonList; American; Amsterdam; amusing; anti-children's books; apartment; art; avery monsen; avery-monsen; BD; best-reads-ever; black comedy; black humor; bmc-resale-hardcover-first-editions; book club; book-titles-that-rock; books-to-buy; borrowed; Box 1; C5; Carnegie Library of Pittsburgh; cartoons; change; children; children's; children's book; children's books; children's books for adults; children's literature; chronicle books; classic; clowns; coffee table; collection - memento mori; Colonie PL; comedy; comic; comics; Comics & Graphic Novels; comics-or-graphic-novels; comicz; contemporary; cp2015; cute; cz2015; dark comedy; dark humor; death; death and dying; December; December 2010; departamento20110730; digital copy; dinosaur; dinosaurs; dinosaurs-dragons; discovered; downstairs; ebook; end tables; English; existentialism; extinction; F MONSE; favorite; favorite-books; favorites; Feb-11; feelings; fiction; Fiction - Novel; finished; first edition; First Floor; first read 2011; friends; friendship; Friendship - wit and humor; fun-and-games; funny; funny book; g humor; G19; gelb; gift; gift from soup; give-me; goodreads; goodreads import; Goodreads20190417; Google; gooodreads_import_20161106; graphic format; graphic novel; graphic novels; Graphic Novels/Comics; greadsimport; grim humor; GRimport; grown up picture book; grown-up; hahaha; hardcover; HC; hope; humor; Humor/Satire; humorous; ILL; illustrated; illustration; illustrations; in library; irony; JN; John; John (Jory); john jory; Jory John; jory-john; July 2011; June 2016; kat; kidlit; kids; Kindle; Kindle Books I Own; kindle-owned; Laura; lendable; library; library-library; lido; Literatura infanto-juvenil; Literature & Fiction; little books; loneliness; longevity; lost socks; macro; maybe-to-read-but-not-in-clan; mine; Miniature book; Miscellaneous; monsen; Monsen (Avery); moochable; mortality; my-library; na; NC1429; new; no tags; non-fiction; non-juv; optimism; overdrive-hoopla-scribd-ku; own; Own (Print); own-physical-copy; owned; owned-books; owned-but-not-read; paper copy; parodies; parody; philosophy; picture book; pirates; poetry; ppld; primary school; purchase; quirky; read; read aloud; read and to read; read but unowned; read in 2011; read in 2014; read in 2016; read in 2017; read in 2018; read in 2019; read in a day; read online; read-at-hastings; read-graphic-novels; read-in-2013; read-in-english; read-in-foreign-language than mine; read2013; Rebecca; recommend; recommendable; Recycled; requested; reread; returned; reviewed; sad; sarcasm; satire; scribd; secondary; series; sewickley public library; SH; shop.vintiquebook.store; short read; short-reads; stacey-s-books; stinky; Subject: Humour; t0611; tebeo; teen; teen appeal; TEMP LOCATION: box 25; theme - funny books about death; to-read; to-read (#1289); top-10000; trees; unowned; USA; uwce; Videocassettes; vintage book collective; vintiquebooks; visual; want to read; wit; Wit and humor; YA; you-make-me-smile; young adult; z-2012-reads; z-2015-read; zombies; zread-in-2017; àlbum; мayвe;

So if I understand >10 JoeB1934: correct your needs would be met if something like the list above was available either directly in your catalogue or maybe just in the search interface? I'm thinking something like
MemberTags: pirates AND MemberTags: Dinosaurs
or some fancy syntax like
TagMash: pirates,+zombies
but only with results from your own catalogue.

I think the last feature would be nice but would probably only be used by a handful of LT'ers?

13JoeB1934
feb 1, 2022, 9:52 am

>11 bnielsen: I have the LT App on my phone but don't see any use to me in this regard

14JoeB1934
feb 1, 2022, 10:13 am

>12 bnielsen: The way I did search for the tagmashes that I ended up with was by creating URL's and saving them as notes. This idea was provided to me by two other members early on in my project.

I created about 30 different note/urls and used them to converge on my final set.
You are correct that they produced a retrieval which was a mix of my books and other's books. I did a copy and paste of that retrieval into excel. Unfortunately, the presence of checkmarks in the copy made it unsortable easily in excel.
So, I did partition in excel into title and author. Even that is a bit of a mess because of a need to separate trailing blanks, etc, But I got it down to a fairly simple process. However, if there was a way to show my books in my library for the tagmash it would have saved me a lot of work.

The REAL bottom line is that I most likely represent a VERY small member of the LT total members and I just ought to get on with life as it exists.

My problem is that I have built commercially viable software and the best product always came from a collaboration between me as a software designer and users who knew what THEY needed.
In my experience software designers can create really neat products but it is best if users get to fine tune it.

LT is a fantastic product and users have been instrumental in it's creation so my needs legitimately can be bypassed.

Take a look at my profile and you will see what path I am on. It appears that there are a few others who would like to do a similar thing and it isn't at all realistic to expect them to jump through the Excel hoops I went through, which was fun for me.

15bnielsen
feb 1, 2022, 10:24 am

>14 JoeB1934: Thanks for sharing. I once wrote a "solve the 8-queens problem" program in Excel so I know a few hoops too :-)
These days I use Libre Office Calc to convert csv to xls automatically, mostly to avoid having my users getting confused with non-ascii characters.

16bnielsen
Bewerkt: feb 4, 2022, 11:29 am

>13 JoeB1934: I was just asking because I've seen some corners in LT that were mostly unknown to me before this conversation. TinyCat and the App also blend things in an interesting way.

ETA: Some more findings: The work page contains a popularity number:
I.e. work 37213 is currently nr 14789 in popularity. And a script can get this.

wget -q https://www.librarything.com/work/37213/popularity/ -O - | grep -o '<a href="/work/[0-9]\+/popularity">[0-9,]\+' | cut -f2 -d\>

returns 14,789

If you are in Your Books say after a search:
https://www.librarything.com/catalog/bnielsen&deepsearch=troskyldige%2C+volt...
you can click on Covers and then on one of the covers and a pop-up window will give you
a line like this (amongst other stuff):
Popular tags: fiction, philosophy, French literature, to-read, French, 18th century, France, short stories, satire, classics

I don't know how to get that via a script, though.

The list looks like the ten highest ranking member tags? Since both Member tags and Member tags (show all) show more.

17bnielsen
feb 5, 2022, 11:09 am

I think I'll be exploring what extra information I can get from downloading some of the work pages.

18bnielsen
feb 9, 2022, 12:36 am

I'm currently considering something like this:
"Work_id","Sample_Date","Member_Count","Review_Count","Popularity","Average_Rating","Mention_Count","Tag_List","Work_Information","Rating_List".

19JoeB1934
feb 10, 2022, 1:40 pm

This would be extremely important for me if I could get such an export on all books.

20bnielsen
feb 11, 2022, 3:35 pm

Just adding a very interesting link.

https://www.librarything.com/stats/MEMBERNAME/tagmash

21JoeB1934
feb 13, 2022, 9:29 am

>20 bnielsen: Yes, I use this all the time. Clicking on any tagmash string brings up all 500 books retrieved and check marks for those in my library. The check marks make it impossible to sort the list to isolate those books. I have requested elsewhere that the checkmarked books be shown in my library.

22bnielsen
feb 13, 2022, 11:39 am

Mostly note to self :-)

More fun stuff. I have been experimenting a bit with my own library. I have 7989 works.

cat work.db | perl /tmp/headchg --delete | wc -l
7989

And they have just shy of 300K different tags. (Which shouldn't really surprise me as a look at the tags (show all) for https://www.librarything.com/work/17336 should show.)
One of the tags on that work is this:

The novel was initially published in an edition of 1.000 copies by the Orioli Press in Florence in 1928. and was almost immediately pirated by at least four different publishers. Lawrence had always intended to release a cheap version. and was spurred on

It really gives some fun inputs on how to use tags. And it seems that there's a length limit on a tag at about 256 bytes.
Maybe also a limit to the number of tags shown? Frankenstein max out at 5000 tags, including these four ● ☀ ♀and ✓

Since people use all kinds of characters in their tags, I can't just use say ; as seperator in that field in my own database, so I'm using two spaces instead. And if a tag contains two spaces I'll reduce that to one. And spaces in front or at the end of a tag are deleted. The idea is that I can search that field just as any other field.

I think I might need a few versions of the tag list for each work.
One with counts:
fable (1) fables (1) fiction (1) fox (1)

One without counts:
fable fables fiction fox

One short list:
1001 books 19th century British British literature classic classic fiction classic literature classics ebook English English literature fantasy fiction Frankenstein gothic horror Kindle literature Mary Shelley monster monsters novel own read Romanticism science science fiction sf to-read unread

One short list with counts:
1001 books (100) 19th century (615) British (278) British literature (383) classic (1,407) classic fiction (99) classic literature (209) classics (1,412) ebook (179) English (178) English literature (356) fantasy (294) fiction (3,351) Frankenstein (218) gothic (844) horror (2,355) Kindle (168) literature (788) Mary Shelley (174) monster (170) monsters (256) novel (576) own (151) read (449) Romanticism (209) science (147) science fiction (1,572) sf (131) to-read (901)

23bnielsen
feb 14, 2022, 4:42 am

More notes to self :-)

I've put the workinfo into my normal database, making it go from 30 Mb to 60 Mb. Already allowing for fun queries like:

cat /tmp/ltnc.rdb | perl /tmp/row Members_Tag_List mat '/ fiction /i' and Tag_List nmat '/;fiction;/i' | perl /tmp/column Title | perl /tmp/headchg --delete

I.e. Books that others have tagged as fiction and I have not.

Alas this includes a lot of stuff, that I really wouldn't call fiction:
Donald E. Knuth: The Art of Computer Programming - Vol 1: Fundamental Algorithms

but one LT user has tagged it as such. Oh well.

So I think I'll use the short list (which I haven't implemented yet) for stuff like finding fiction amongst my non-fiction. But the long list for tagmash like stuff like this:

cat /tmp/ltnc.rdb | perl /tmp/row Members_Tag_List mat '/ dinosaurs /i' and Members_Tag_List mat '/ pirates /i' | perl /tmp/column Title | perl /tmp/headchg --delete

Den hemmelighedsfulde Ø
All my friends are dead
Den hemmelighedsfulde ø

cat /tmp/ltnc.rdb | perl /tmp/row Members_Tag_List mat '/dinosaurs/i' and Members_Tag_List mat '/pirates/i' | perl /tmp/column Title | perl /tmp/headchg --delete

En Verdensomsejling under Havet
En Verdensomsejling under Havet
Den hemmelighedsfulde Ø
All my friends are dead
Den hemmelighedsfulde ø

I can also find out that the tag "The sea ships and Pirates" is giving me "En Verdensomsejling under Havet", i.e. 20.000 leagues under the sea.

Fun stuff!

24JoeB1934
feb 14, 2022, 9:35 am

>22 bnielsen: That sounds very useful, but I am not at all confident that my programming skills are up to it. Thanks for the concept.

25bnielsen
feb 15, 2022, 2:29 am

>24 JoeB1934: I think I've gotten most of my programming errors ironed out, so I'm ready to give your books a try. Drop me a comment with your email address if you are interested. I can get Book_Id and Work_Id from Your Books and produce a text file that I _think_ Excel might be able to import :-) If it clogs up Excel I'll see if I can limit my script to produce something more simple.

26bnielsen
feb 15, 2022, 4:50 am

Ah, fun stuff. I don't think I've come across books with this kind of warning:

https://www.librarything.com/work/10412765/book/212168754

I'll just pretend I didn't see it :-)

27bnielsen
feb 15, 2022, 6:32 am

Mostly note to JoeB1934, but maybe others will feel inspired? (or bored to death?).

With a bit of trouble I now have a text file without the one book from >26 bnielsen:. It _should_ be easy to import to Excel (i.e. I haven't tried).
It lets me do stuff like:
cat /tmp/joenc.rdb | perl /tmp/row Members_Tag_List mat '/romance/i' and Members_Tag_List mat '/widow/i' | perl /tmp/column Book_Id | perl /tmp/headchg --delete | xargs echo | sed -e 's/ /+OR+/g'

I.e. I'm searching in the column Members_Tag_List and if it contains the words romance and widow, I take the number in the Book_Id column. The numbers are then glued together with +OR+ giving:
212169498+OR+212169612+OR+212169482+OR+212169723+OR+212170575+OR+212168826+OR+212168857+OR+212169353+OR+212168999+OR+212168556+OR+212169329+OR+212169623+OR+212169528+OR+212169194+OR+212169431+OR+212169103+OR+212170079+OR+212170072+OR+212168675

which can be appended to search in JoeB1934's library:

https://www.librarything.com/catalog/JoeB1934&deepsearch=212169498+OR+212169...

I think this can also be done in Excel more or less manually depending on your excel expert rating :-) However the columns in /tmp/joenc.rdb are very wide, so it might run into a limit in Excel?

(The normal search box doesn't allow so long a search string, so that's why we take this detour).

28JoeB1934
feb 15, 2022, 11:17 am

>27 bnielsen: I am sorry that I haven't been back with you faster. Doctor visits and other "normal" activities of someone at 87 years old. Give me a few days and I will get back with you about how to proceed.

29bnielsen
feb 15, 2022, 1:47 pm

>28 JoeB1934: No need to say sorry :-) Worst thing that can happen is that I get time to document what I've been doing. So far it's been great fun. Thanks for starting me on this.

30bnielsen
feb 20, 2022, 7:25 am

Just fixed an obnoxious bug / wrong assumption. Some of the tags I got from the work pages are not utf-8, so trying to do anything clever on that basis threw a lot of error messages. (I think most of the errors came from utf-8 strings clipped after say 255 bytes, leaving half-a-character.)

Notes to self:
cat /tmp/lt.rdb | perl /tmp/column Work_id | perl /tmp/headchg -del | sort -u | sort -n | xargs perl testwork N work.db
cat /tmp/lt.rdb | perl /tmp/column Work_id | perl newcolumn-from-workdb N /tmp/nc
paste /tmp/lt.rdb /tmp/nc N /tmp/ltnc.rdb

Fun? stuff:
cat /tmp/ltnc.rdb | perl /tmp/row Members_Tag_List mat '/crime/i' and not Tag_List mat '/;crime;/i' | perl /tmp/column Title | grep Small

Samtaler med rabbiner Small

I.e. someone tagged "Conversations with Rabbi Small" by Harry Kemelman with the tag "crime". I wonder if they have read it?

More to do? Adding Work_Title, maybe Work_Description? Maybe using two spaces as seperator in my own data?