200,000 blank tags

DiscussieBug Collectors

Sluit je aan bij LibraryThing om te posten.

200,000 blank tags

1norabelle414
nov 2, 2018, 9:14 am

As far as I am aware, creating a blank tag should be impossible. I have tried a few ways of adding a blank tag to books in my catalog and I can't do it. However, there seem to be over 200,000 uses of a blank tag in the system. Due to the known bug where combining a tag starting with a double quotation mark results in a blank tag combination proposal, the blank tag has been combined with a variety of other tags including "culture", "Brazillian literature", and most relevant to LT, "LTER". The screenshot below can be found at http://www.librarything.com/tag/detail/culture#tagaliases.



The fact that the blank tag is combined with all these other tags is a problem that can be fixed by tag separation, but where did all these blank tags come from?

2norabelle414
nov 2, 2018, 9:18 am

Here is the blank tag in the site Zeitgeist

3bnielsen
nov 2, 2018, 9:45 am

It might be a blank?

http://www.librarything.com/tag/%20&norefer=1

And of course I found myself searching for hints for this bug.

http://www.librarything.com/tag/&norefer=1

4elenchus
nov 2, 2018, 11:08 am

And to be clear, blank is separate from a tag comprised of a space or multiple space characters?

5norabelle414
nov 2, 2018, 11:10 am

>4 elenchus: You can't make a space tag. If you try, when you refresh the page it will be gone.

6lorannen
nov 2, 2018, 11:37 am

Oof, this is weird. I'm digging around seeing if I can figure out where it exists (i.e. member catalogs), but Tim will almost certainly have to find the why. So many combinations proposed for this one!

7norabelle414
nov 2, 2018, 11:49 am

>6 lorannen: If you click on the first link in >3 bnielsen:, you can see the books/people who have used that particular alias of the "culture" tag. There's no way to get to it from the "culture" tag page, because there is nothing to click on, so I'm guessing bnielsen got there through URL manipulation

8bnielsen
nov 2, 2018, 12:49 pm

>7 norabelle414: correctly guessed.

9lorax
nov 2, 2018, 1:49 pm

lorannen (#6):


So many combinations proposed for this one!


The combinations are an artifact of the bug that proposing a combination involving a tag starting with a quotation mark (") ended up as proposing a combination with a blank tag. That may be a good place to start, actually.

10gilroy
nov 2, 2018, 1:50 pm

>6 lorannen: A lot of the present proposals are separations and from going through them, there's a LOT of duplication because (I think) I see three different names all doing the same thing and not waiting for things to clear before putting in more.

11lorannen
Bewerkt: nov 2, 2018, 6:07 pm

This is somewhat eluding me. Even going through members' catalogs who in theory have used the tag, I can't find it once I'm there.

>10 gilroy: Can you spell that out for me more clearly? What same thing are they doing?

ETA: Thanks very much for the blank tag page link in >3 bnielsen: (https://www.librarything.com/tag/%20&norefer=1)—that was a big help!

That said, I'm stumped and have called in the reinforcements. I suspect that this is data that's been floating around for some time, but has only been surfaced and made visible by the combination with the culture tag.

12MarthaJeanne
nov 2, 2018, 6:29 pm

>11 lorannen: There is a long list of separations suggested on the culture tag page. These are from three different members, and there is a lot of duplication.

mmseiple has proposed separating the tag culture from literatura brasileira.
Vote: Yes | No | Undecided Current tally: Yes 15, No 0

SandraArdnas has proposed separating the tag culture from Literatura brasilera.
Vote: Yes | No | Undecided Current tally: Yes 12, No 0

Dariah has proposed separating the tag culture from Literatura Brasileira.
Vote: Yes | No | Undecided Current tally: Yes 7, No 0

Until these current proposals have cleared we can't see which have been missed.

13omargosh
Bewerkt: nov 7, 2018, 8:46 am

I've occasionally come across members with tags that contain leading or trailing spaces, or see these up for combination. I wondered if these were perhaps entered at a time before such whitespace issues were fully caught / dealt with. It wouldn't surprise me if 200,000 blank tags were entered before code fully dealt with whitespace well. I suppose that if any validation happens on the client side, say with JavaScript, instead of server-side with php or whatever, there could possibly be a browser contribution at play. But when looking at the apparent top uses in bnielsen's first link, and sorting those catalog pages by entry date, most of the books apparently using the blank tag have entry dates "suspiciously" no later than July, 2009 (with one exception: http://www.librarything.com/catalog/P.S.Dorpmans&tag=+&alias=1 seems to have two such works that got entered on November 15, 2014).

14norabelle414
dec 3, 2018, 6:29 pm

None of the separation proposals for "culture" went though properly.
See https://www.librarything.com/topic/299683

15SandraArdnas
dec 3, 2018, 7:07 pm

Yes, I reported it as a separate bug and linked to this in case the two issues are related

16norabelle414
jan 7, 2019, 9:46 am

bump

17norabelle414
mei 23, 2019, 9:52 am

bump. I notice that the LTER tag is no longer combined with this one, but all the other tags are still combined with 200,000 blank tags.

18MarthaJeanne
mei 23, 2019, 11:34 am

It would be lovely if staff could at least separate these from legitimate tags.

19norabelle414
aug 20, 2019, 2:18 pm

This seems to be pretty sorted out now. The tags "culture" and "Brazillian Literature" and "Antwerp" and "LTER" all look fine. I can't remember what other tags were lumped in there.

The 200,000 blank tags still exist but they are lumped in with various punctuation, not anything that might get messed up: https://www.librarything.com/tag/detail/+#tagaliases

Not sure if I should mark it as "closed" since the blank tags themselves technically still exist?

20SandraArdnas
feb 16, 2020, 6:25 pm

Alas, the problem is back and again the blank tag appears as an alias of 'culture' tag and it dragged with it over a thousand unrelated tags, which are now combined with it even though there was no vote on those.

Just take a look at the aliases in there, it's far worse than before

https://www.librarything.com/tag/detail/culture#tagaliases

21norabelle414
feb 17, 2020, 11:19 am

Yikes! Thanks for noticing!!

22xaagmabag
apr 11, 2020, 7:59 pm

The tag "culture" now has over 1400 unrelated tags combined with it including practically every Portuguese literature and a host of Shakespeare and Dostoevsky tags. This bug should probably be bumped up in priority before every tag on LT is falls into this black hole.

http://www.librarything.com/tag/culture

23SandraArdnas
apr 11, 2020, 9:50 pm

Moreover, separations do not happen after a successful vote. There was around a hundred proposed, voted and cleared, but none were actually separated.

24xaagmabag
apr 12, 2020, 1:14 pm

I have noticed that separations are taking a very long time to show up in the system as separated after a successful vote. I made a load of separation proposals for Mao from many variations of Mao Zedong many, many, many weeks ago (2-3 months at most?) and only realized that they were separated just yesterday. I should have kept better track of when the separations reached a successful vote and when they actually separated.

Without knowing how the underlying process happens behind the scenes, my only guess is that this process gets scheduled either during some sort of "down time" when the "system isn't busy" or happens when some sort of threshold is reached (like some number of combine/uncombine in the thousands?). Or maybe it's just one process in a very long queue of processes? It almost seems random to me. I have seen the effect on the closed list of proposals because the number of tags I voted on temporarily becomes smaller, but then shows the correct number like a day or two later. The actual combining and separating seems to take place much later after the threshold and closed lists get updated.

One of things that bothers me about this process is since the tags haven't yet been "officially" combined, they are still open for combination again. While voting, I see many tags that I already proposed being re-proposed (and sometimes re-re-proposed!). So in effect, we're re-doing work that already has been done.

I wonder how the system handles a re-proposal after the system has officially combined them from the original proposal?

25norabelle414
jun 5, 2020, 9:47 am

The "culture" tag is such a nightmare, please help!!

Along with shakespeare, U.S. History, brazilian literature, and portuguese literature, this tag now contains:
read in 2005
read in 2008
read in 2011
read in 2013
hugo nominee
argentinian literature
the dark is rising
ya-paranormal
horror-thriller
P. G. wodehouse
religious beliefs
owned-and-unread

is there a way to at least stop this tag from eating all the others?

26aspirit
jun 5, 2020, 10:08 am

Fixing the bug that creates blanks in combination requests would be a start in preventing this.

27aspirit
jun 5, 2020, 10:11 am

When are the separations going to happen? Votes are up to 23 Yes, 0 No, 0 Undecided.

28Nevov
Bewerkt: sep 27, 2022, 10:27 pm

This bug seems to still be plaguing the system, for example:
https://www.librarything.com/tag/detail/Mai
seems to be combined with the blank tag/punctuation tag mentioned in >19 norabelle414:
(Edit, this is the norefer page for the Mai tag: https://www.librarything.com/tag/Mai&norefer=1)

I've noticed while looking around at it, when I went to the CK tab, in the canonical form, I tried setting a non-blank (the next one on the dropdown list, two dots), and it recorded my action in the CK history: https://www.librarything.com/commonknowledge/changelog.php?f=62&item=37&...
But didn't actually allow the different canonical form to be set. Could a bug in that process somehow be contributing to the black hole and aggregating tags where the canonical form has gone wrong, interpreting them as blank.

29kristilabrie
sep 28, 2022, 8:56 am

So, let me get this bug straight, since a lot of the examples are outdated:

- There is a bug, wherein when someone combines a tag that starts with quotations with any other tag, the system recognizes the tag starting with quotes as a blank tag? Do we know if it is only happening with tags that start with quotations, or is there another instance that we have seen?
- When those combinations happen, valid tags (such as "Mai" mentioned in >28 Nevov:) get lumped in with this "empty" (though the URL indicates it's a space) tag: https://www.librarything.com/tag/%20&norefer=1. Is that right?

How have these issues been getting cleaned up: do you have to separate the valid tag from each of the punctuation tags that show up on that "blank" tag page (from the drop-down options, under "Propose Separation")?

30norabelle414
sep 28, 2022, 9:20 am

The "combining tags with quotes" is a separate bug that long predates this one, I think there's a separate bug report around somewhere.

It would be extremely rare for a tag combination with a blank tag to go through, it would require 5(ish) people to vote yes and zero people to vote no without reading the combination proposal (because if they read it it would obviously look wrong). So I'm not sure that's how this is happening.

As far as I know, no user has ever been able to separate the blank tag from any other tag, the last time we tried (>14 norabelle414:) it didn't work so I think it must have been separated by a staff member

31kristilabrie
sep 28, 2022, 9:23 am

Testing a number of test tags/combinations:

1. Combine ". test period" (no quotes) with "test tag to combine with" (no quotes, AKA "test tag" in list below), view on "Combine/Separate" tab = OK
2. "/ test slash" (no quotes) with test tag = OK
3. " space before tag" (no quotes) with test tag = Combination did not show up on Combine/Separate page. I didn't see any errors in the console but it appears to have gone into a black hole. Attempted twice, same result. *

NB: Another bug, I think: I'm given links to two different tag pages from "Your books", depending on whether I click on my newly added tags before/after refreshing the page. Before refreshing the page, I get https://www.librarything.com/catalog_bottom.php?tag=+test+space+before+first+tag whereas after refreshing the page I get https://www.librarything.com/catalog.php?tag=test+space+before+first+tag&vie...

* The combination proposal was sent to https://www.librarything.com/tag/detail/test+space+before+first+tag#combinations, without the plus/space (but from the tag page with the plus/space).

4. reproduced #3 with https://www.librarything.com/tag/detail/+test+double+space+after+comma#combinati... and https://www.librarything.com/tag/detail/test+double+space+after+comma#combinatio...
5. " test space after quotes" (with quotes) with test tag = disappears on https://www.librarything.com/tag/detail/%22+test+space+after+quotes%22#combinati..., pops up on https://www.librarything.com/tag/detail/+#combinations
6. "test quotations" with test tag = disappears on https://www.librarything.com/tag/detail/%22test+quotations%22#combinations, doesn't add a new combination at https://www.librarything.com/tag/detail/+#combinations = so, quotations seem to just error out the rest of the tag seeing it as blank or something.

Will see if I can bring this to the developers to see if they can find a fix.

32kristilabrie
sep 28, 2022, 9:25 am

>30 norabelle414: Okay, thanks. So, do we know how the blank tag is getting created? I wonder if it's getting created through this combination process or something. Or am I missing anything?

33norabelle414
sep 28, 2022, 9:25 am

Here is the oldest bug report I could find for the quotes-in-tag-combinations bug: https://www.librarything.com/topic/99481

34norabelle414
Bewerkt: sep 28, 2022, 9:31 am

>32 kristilabrie: We do not know how they are being created. I tried a LOT of different things to recreate it and I never could. Also it's been 4 years and there are basically the same number of blank tags (actually ~5000 fewer) so I think their ongoing creation is probably not something to be worried about.

35kristilabrie
sep 28, 2022, 9:32 am

>34 norabelle414: Ahh, ok. I'm wrapping my head around this, now. When I was digging into it, I think >13 omargosh: had a good lead there - from the catalogs I looked at, the ones who were using that tag were all before July 2009. So it must have been old code and how blank tags were able to be saved/created. Thanks.

36timspalding
mrt 6, 2023, 10:30 am

I have a script that can fix this. Marking as important and for me.

37timspalding
mrt 7, 2023, 9:48 am

Okay, marking as fixed. The core book data has been fixed. It's going to take a while to propagate to all the cached and summarized data.

38norabelle414
mrt 13, 2023, 12:17 pm

This is still broken. If you go to the tag "Mai" (https://www.librarything.com/tag/Mai) it is combined with the blank tag but it does not show up on the list of aliases, and therefore cannot be separated. There might be other tags stuck in there too but there's no way to know.

39kristilabrie
mrt 14, 2023, 9:45 am

1. go to https://www.librarything.com/tag/detail/Mai#aliases and see the blank tag with over 200,000 uses
2. go to https://www.librarything.com/tag/detail/Mai#combinations and see that knerd.knitter has proposed a separation for Mai from that blank tag, and the threshold has been met, but it remains.

40MarthaJeanne
mrt 14, 2023, 9:58 am

>39 kristilabrie: But it hasn't been open long enough to close.

"Tag voting
All of the currently-open tag combination and separation votes can be seen here: https://www.librarything.com/tags_combinations.php
Threshold: At present, an answer meets the threshold if it has more than four times as many votes as the opposite, and is winning by five (5) or more votes.
Closing votes: For a vote to close it must have been open for at least a week. It may close at any time after that.
Decided votes: For a vote to be decided, it must be at a threshold when it closes."

41xaagmabag
mei 2, 12:01 pm

I just found another blank tag, plus a whole bunch of tags that are just various lengths of underscores, dots, and some other punctuation. I have put in proposals to separate them all from "Scratch N Sniff":

https://www.librarything.com/tag/detail/Scratch%20N%20Sniff#tab:combinations

I am certain that these were never proposed to be combined as most of us (except for one or two) would have voted NO.

42MarthaJeanne
Bewerkt: mei 2, 12:25 pm

>41 xaagmabag: There was a time when tags could be combined and separated without voting. Voting was introduced because so many bad combinations were being made.

https://www.librarything.com/topic/84796

That was in February, 2010. After 14 years, I think we have dealt with most of the worst combinations from before.

43norabelle414
mei 2, 12:46 pm

>41 xaagmabag: That's the same blank tag, as mentioned in >38 norabelle414:, but we can see if the separations go though. There's no way to tell how many other tags are stuck in there, though.