CZ Talk:The Big Cleanup

From Citizendium
Jump to navigation Jump to search


Questions that it would be useful to have answered. Is this a good idea? Is it worth the effort? Is this something we can expect to implement on a large scale? Is it too confusing to be implemented on a large scale? (If the answer is "yes," please be honest.) Should we add, or delete, any fields to the checklist? Should we add, or delete, any items to the cleanup "to do" list?

Particularly if you have been doing some testing, please give feedback here. Are there Article Checklist fields that you'd like to see added? Would you like to see new categories tracked? New things to put in the checklist?

If basic cleanup includes removing underlinking, will the links have to be reinserted as and when those new (related) articles come up in CZ? But will anyone be tracking them at that time? On the other hand, if the red tags remain, that may stimulate some of the contributors to start articles on those - at least in WP I had created many new pages from the underlinks. Supten 22:56, 8 March 2007 (CST)
Basic cleanup does not involve removing red links to articles, but only red links to nonexistent templates, pictures, categories, and interwiki links. --Larry Sanger 21:56, 11 March 2007 (CDT)

Hmmm. The workflow feels broken... or maybe I just didn't quite fall into one. Might do a bit more somewhere (there were four actual articles and eight redirects to Baha'i Faith in my dozen!) 'Dragon' Dave McKee 14:00, 9 March 2007 (CST)

Well, as the veteran of countless hours of busywork, my advice is to pick one workflow and stick to it. As you practice exactly that workflow, you get better and more efficient at it quite quickly. --Larry Sanger 14:57, 9 March 2007 (CST)

There seems to be some problems with template implementation. Not specifying either "y" or "n" in the "underlinked" attribute leads to a "Not specifiedNo" output. As other attributes have no problem with not specifying any entry that seems strange. Also specifying "status = 2" leads to the article (resp. the talk page) becoming member in "Computers Developing Articles" as well as "Computers Nonstub articles" (and if one specifies the status as 1, one additionally gains the "developed article" category. Shouldn't it be only one category? Otherwise "Nonstub article will become quite full. --Markus Baumeister 15:04, 9 March 2007 (CST)

I'll investigate and fix the first bug. "Developing Articles" are a subset of "Nonstub Articles," which is 1 + 2--a useful category, because it shows us how many (and which) articles are under active development and beyond stub stage. --Larry Sanger 16:02, 9 March 2007 (CST)

Another thing: It would be useful if the Template:Cite needed template would exist. I just removed it several times from Computer Science because it was red. That seems wasteful. If WP thinks some citations are lacking, we should not remove those hints during cleanup. --Markus Baumeister 15:53, 9 March 2007 (CST)

I disagree. Articles are written for the end-user, not for contributors, and the Template:Cite needed template is a hint strictly for the use of contributors. Users should take everything in the article with great caution, regardless of any template, if it hasn't been approved by an expert. Besides, Wikipedians do not seem to be particularly good at deciding what does and does not require citations. --Larry Sanger 15:59, 9 March 2007 (CST)
I agree with Larry. In practice, many "POV-pushing" Wikipedians use Template:Fact or Template:Cite needed (or the worse Template:Dubious) just as a mark for the content they do not like (or a first step to removing something if the requested source is not given). So it needs a general cleanup. While sometimes it is well-intended and, in Wikipedia, perfectly OK, it looks useless here, as I guess we make the well-intended requests on talk pages. --AlekStos 04:05, 10 March 2007 (CST)

Hi Guys - I missed out canis familiaris and Canonical Gospels for now, and did an extra 2 instead. The canis article seems to have been developed a lot from the original Wikipedia version, so "External article: from another source, with little change" didn't seem right. And I wasn't sure about Canonical Gospels. --luke 19:25, 11 March 2007 (CDT)

Hi Luke, thanks, and as to canis f.--it's a redirection page and therefore needs no checklist. As to the gospel article, it's definitely beyond a stub, but with no intro at all you can't bold the title and therefore cleanup is not done. --Larry Sanger 11:54, 14 March 2007 (CDT)

Some feedback. It looks like a good idea. However, it was not as simple to do as I thought before. For example, it is not always clear whether the "WP checkbox" should be checked (see e.g. FCLB). Sometimes it was difficult to judge the status (is the article nearly complete or not that much??), especially when it concerned topics outside my domain (so virtually all articles given by the "standard" alphabetical range). I assessed as "almost complete" a viper article; at the same time a much longer article on another viper was labeled "developing" by a more qualified editor (BTW, the serpents are somehow special).

Is it worth the effort? I think so. It gives a good general framework. It can be useful for workgroups and improve some "management" of the project. I see an analogy with bookkeeping in a business - while sometimes it can be excessive or "virtual", basically it reflects the state of the project.

Is this something we can expect to implement on a large scale? I guess it's not impossible. I'd prefer to deal with my domain. This would at least double the speed and divide the number of doubts by two. However, if we put "cleanup your home" as the general rule, a problem arises, since not all workgroups are really active at the moment. So, maybe we could start the cleanup by domains where we have some active users and then pass to what remains.

Now, how many articles per author on average? As far as my scripts can tell, we have about 100 users visible on recent changes from March 1; about 40 of them have made more than 10 edits. So, roughly 30-40 articles per author. Looks feasible, not effortless, though. Not sure how many articles here on board; nor how many authors would be interested in "accountancy".

Other remarks/suggestions.

  • Almost all our articles - as far as I can tell - are "underlinked". I guess it still will be the case a few months from now. So the checklist entry looks superfluous.
  • technical issue observed: The "Show changes" button do not work properly when editing a section (if editing the whole page it works well). BTW, glad to see "WP content" checkbox working properly ;-)
  • Serpents articles are somehow special... These are often of status 2 or 1 CZ live _internal_ articles - and at the same time not very different from its Wikipedia version, so, formally, of status 4 _external_. I think this would be more frequently the case when we have more former WP authors on board. It is not clear to me whether the "content-from-WP" checkbox applies in such cases. After all it uses the author's own knowledge and not that much the content of WP. Maybe a declaration of the author would do?

Sorry, it was long (just some thoughts). --AlekStos 16:21, 12 March 2007 (CDT)

Thank you, Alek, for the best and most comprehensive comments yet. You're right that it isn't easy, it requires many judgment calls, and we need to expand our rule set carefully. But I also have become convinced that it will help tremendously to have all the various categories that the checklist creates--and to preliminarily assigned all of our articles to workgroups.

You can deal with your domain, but first we need to assign articles to workgroups, so that you can review the articles in your domain.

As to the snake articles, they should be 4, I think. They are special in that the main author of those articles has declared that he wants to edit/maintain them here on CZ (well, perhaps pending a Forum discussion--I haven't checked in on that). But that doesn't make them any less 4s: they have not been significantly changed from the WP copies. I could be mistaken (a check of the page history is in order), but I doubt any of them have three or more edits in three different places, which is the minimum criterion for something to move out of the 4 category.

I believe you are mistaken about the proportion of underlinked articles. It's true that most articles are underlinked, but at present writing, 46 of 79 articles are underlinked. I would hope that this proportion will decline over time, and this is important to work on--particularly in particular domains. For example, I can more easily find a "home" for an orphan philosophy article.

--Larry Sanger 20:27, 13 March 2007 (CDT)

Right. Assigning workgroups seems absolutely necessary; perhaps some snake checklists should be reconsidered; I have no big problem with "underlinked" entry - just had no chance to put there "no". As a side note, please do not think that the term "accountancy" was used in a pejorative way. To make it clear, I think it is needed as necessary for efficient management. --AlekStos 09:21, 14 March 2007 (CDT)

Any last objections, suggestions, etc. before we begin??? --Larry Sanger 13:53, 14 March 2007 (CDT)

Some stats. As of today, we have 1584 articles found e.g. on special pages->most viewed pages (and counting!). 1101 of them are marked CZ-Live, not too bad. 159 are marked as content-from-wikipedia. 832 have no workgroup assigned (I can provide lists if anyone is interested in). I can not believe that only 159 articles come from WP, I think that virtually every article should have its workgroup. This is why the Big Cleanup is needed. --AlekStos 05:01, 15 March 2007 (CDT)

update this sentence

I noticed a difference in this sentence: "We divide our body of articles into five categories: approved, developed, developing, stub, and "external" (i.e., borrowed from Wikipedia but not significantly changed). Furthermore, since every article is also marked with its..." and the categories listed in the template you are developing. -Tom Kelly (Talk) 12:58, 17 March 2007 (CDT)

Well, I recently updated the template. The sentence was updated to reflect the template update. Clear enough? --Larry Sanger 14:39, 17 March 2007 (CDT)

I did not see advanced or internal in the sentence but it is listed further in the article. I added internal and advanced to the sentence - feel free to remove if that was not the right thing to do. I think internal and advanced could be defined here like external is. what do you think? is it defined somewhere else? -Tom Kelly (Talk) 03:04, 18 March 2007 (CDT)

"Internal" refers to all articles in categories 0-3. "Advanced" refers to all articles in categories 0-1. Yes, it is defined somewhere else, namely, CZ:The Article Checklist (section "Article Status"). --Larry Sanger 08:10, 19 March 2007 (CDT)

Don't forget to include Health Sciences Workgroup in some Biology / Chemistry articles

I noticed that the articles on Vitamins where not tagged with Health Sciences Workgroup. I would like to stress that it is important that articles related to medicine, even though their main focus may be biology, chemistry, or biochemistry, should be included in the Health Sciences Workgroup. -Tom Kelly (Talk) 21:55, 22 March 2007 (CDT)

Tom, I'd like to stress that you're not in a position to "stress" something if it hasn't been decided. You are in a position to suggest it, though.  :-)

Why do I go out of my way to point this out? This isn't Wikipedia. On Wikipedia, people typically boss each other around, as if everyone really were an "editor"--as they call themselves. The result, to my mind, is completely ridiculous--just a bunch of people posturing as if they were in authority, when they're not in authority at all. Or, maybe more accurately put, they get their authority from the WP mob by acting as if they were in authority, and getting other people to agree with them.

On CZ, I'd like to encourage a different, more (small r!) republican ideal. Citizens can pressure each other in various ways, but they don't pretend to be able to order each other around. They persuade, they don't boss. If bossiness is required, that's what constables and, occasionally, editors are for. --Larry Sanger 13:29, 22 April 2007 (CDT)

Cleaning up "Articles for deletion"

In my first group of articles was the article "Tcf", which is marked "Articles for deletion." Should I bother doing the cleanup tasks to articles that are so marked? Bruce M.Tindall 18:18, 30 March 2007 (CDT)

This one is just skirting by over the speedydelete wire. The question is if the field is maintainable according to CZ:Article Deletion Policy in which we need an editor to decide to delete. I'll go ahead and put the cleanup template on. --Matt Innis (Talk) 22:15, 5 April 2007 (CDT)

Should we bother cleaning up big unchanged external articles?

Yesterday's Notice Board announcement said that external articles are being deleted "pretty aggressively." So if, during a Big Cleanup run, I find large articles copied verbatim from Wikipedia without significant changes, should I even bother removing the images, adding workgroup categories, etc., or should I just fill out a checklist with the category "External" and assume that the article will be deleted? Bruce M.Tindall 20:16, 5 April 2007 (CDT)

Bruce, go ahead and put the {{speedydelete}} template on the top of the article page and if it doesn't qualify, I'll take it off. --Matt Innis (Talk) 21:58, 5 April 2007 (CDT)


Please list any questions you have below, and Larry (or someone) will answer them.

What if I don't know what category to put an article in?

Choose a category from our list of workgroups that seems most likely to you, and then make sure that on the checklist you set cat_check to "yes" (so, one line in the checklist template looks like this:
cat_check = y
If none of the categories look right, then add Category:Needs Workgroup to the article.

An article is not linked from other expected articles not because the links were not made, but because those other articles do not exist yet. Expected links from existing articles instead exist. Does such an article qualify as underlinked? --Nereo Preto 14:19, 8 March 2007 (CST)

Yes, it does. The point of tracking "underlinked" articles is that we want to encourage the development of the important conceptual pathways, as it were, to our relatively specialized articles. The more of these "in demand" articles we create, the more sense CZ will make to the end user. "Underlinked" articles are a superset of orphaned articles (articles to which no other articles link), but reason for caring about the concept is roughly the same.
I don't understand underlinked. When I searched for it, I found it defined (somewhere -- sorry) in terms of not having a link from any workgroup -- and that's all. But this discussion seems to be about things to which there are not enough links in other articles. Is it obsolete, or is the other one wrong, or--? Daniel Drake 13:02, 10 April 2007 (CDT)

Does Topic Informant Workgroup exist? I assigned it once and it was red. --AlekStos 16:47, 12 March 2007 (CDT)

It does. Check the link: Category:Topic Informant Workgroup.
I see. It works in the article (=mainspace); in the checklist, however, it is red. Maybe I did something wrong, maybe it's a problem with the template -- see F. Albert Cotton, perhaps the first TI Workgroup's checklisted. BTW, I believe this Workgroup should appear somewhere over there. I put it under "Humanities", but do not know whether it is the right place (easy to revert). --AlekStos 09:55, 14 March 2007 (CDT)
Ok, I solved my problem: TI Workgroup Home Page does not exist yet (and this is what is linked from the checklist). --AlekStos 10:12, 14 March 2007 (CDT)

What is the workgroup for Acetabulum? -Versuri 05:56, 16 March 2007 (CDT)

Anything about ancient Greece or Rome goes in Category:Classics Workgroup. If you believe this might also properly belong in some other workgroup as well, you can always set 'cat_check = y'. --Larry Sanger 08:56, 16 March 2007 (CDT)

I have not removed the red link images of vipers. I think that Jaap will use it. -Versuri 07:54, 17 March 2007 (CDT)

All right; I would have removed them, but I don't think it matters much either way in this case. You could ask him. --Larry Sanger 08:40, 17 March 2007 (CDT)

A number of the "V" sets that people can sign up for contain only redirects to snake articles. I assume this means we can just cross those sets off as done? --Joe Quick | Talk 04:31, 19 March 2007 (CDT)

Yep! You get credit for discovering that there's nothing to do. --Larry Sanger 20:52, 20 March 2007 (CDT)

How should we deal with approved pages and their draft pages? It would seem like the approved version should be marked as "approved" while the draft page is marked as "developed." Does that make sense or are we not even going to add the checklist to both versions? --Joe Quick (Talk) 20:04, 20 March 2007 (CDT)

Add the checklist only to the draft talk page, not the talk page of the approved article, and call it approved (status = 0). The article talk page is supposed to redirect to the draft talk page, actually. --Larry Sanger 20:52, 20 March 2007 (CDT)

What are your techniques for searching WP content in CZ articles? Any tools for doing this effectively? Yuval Langer 17:57, 24 March 2007 (CDT)

You should probably make a trip to the page history in most if not all articles. If you want to determine whether an article is sourced from Wikipedia, then just look at the first version in the edit history. Virtually all Wikipedia articles left in the database have templates and images (that we have not uploaded, and thus are distinctive red links). That should be enough for us to tell whether to check the "Content is from Wikipedia?" box. If you want to determine how much an article has been changed from its Wikipedia original, go to the page history and press the radio buttons next to the oldest and the newest edits, and hit "compare". You'll be able to see the differences there. We have mostly been assuming that the original-uploaded version is identical to a Wikipedia original. --Larry Sanger 18:20, 24 March 2007 (CDT)
There is also a script that detects identical sentences in CZ and WP (should not be considered 100% reliable though) --AlekStos 03:05, 11 April 2007 (CDT)

Should I be deleting categories in an article that are not directly one of the workgroups? (ie delete neurology and replace it with health sciences) David Martin 09:40, 29 March 2007 (CDT)

Yep. Particularly if a category is red. Delete all red categories; leave the blue categories so that we don't lose any potentially useful info. --Larry Sanger 12:26, 29 March 2007 (CDT)

After an article has had 3 major edits and has been tagged CZ Live, should the flag be removed stating that it is from Wikipedia? I was unsure if this flag is to give credit to Wikipedia for the basis of the article or if it is just to track those articles that need editing. David Martin 20:16, 2 April 2007 (CDT)

How much information is needed for an article to be flagged as containing content from Wikipedia? If there are only a few sentences or one section, does this constitute "containing content"?David Martin 22:10, 2 April 2007 (CDT)

I asked this over at and the answer was that the article retains the Wikipedia marking (The Scarlet W) forever. Or, I suppose, until all WP content, down to the one-sentence level, is eliminated, whichever comes first. I've been acting on that, so it would be good to know if it's wrong. Daniel Drake 13:19, 10 April 2007 (CDT)
My understanding is that credit box should be checked as long as there is one sentence in common, see this [1] In fact, a few hundred articles were checked according to this simple rule ;-) The exception is the situation when the same author releases the same text on WP and then on CZ. If we have declaration of the author, then the creditbox does not apply and we use {{WPauthor}} template instead, see e.g. Talk:Gaspee_Affair, Talk:Newton's method. --AlekStos 03:07, 11 April 2007 (CDT)

Should I remove the Taxobox templates? Yuval Langer 00:18, 11 April 2007 (CDT)

Not sure what others have been doing, but if the taxobox template isn't uploaded, I'd move it to the talk page. --Larry Sanger 01:04, 11 April 2007 (CDT)
AFAIK, taxobox is re-implemented here and basically it works, so that there is no need to remove. It is used within _many articles and I guess it has been left alone by BigCleaners (e.g. me). One exception is the field "status" of the template. It is not implemented and it results in some bizarre text, see e.g. Vipera seoanei. In this case I just put it in comment (tags like this <!-- commented text -->) as it contains some information to be probably useful in future. --AlekStos 02:49, 11 April 2007 (CDT)

How do we deal with disambiguation pages? More specifically, are they meant to be categorized, checklisted/have workgroups assigned? If the answer is "no", who will "take care" of them? If the answer is "yes", what do we do with some "broad" disambigs such as Chinese, which involve pretty many workgroups? Let's make some standard for this (if there exist one already, please indicate...) --AlekStos 11:28, 11 April 2007 (CDT)

Well, they aren't articles, so I'm inclined not to have them listed. On the other hand, we can always just create status = 5: other pages. (Or we might have 5 = catalog, 6 = list, and 7 = disambiguation.) What do the rest of you all think? --Larry Sanger 14:09, 25 April 2007 (CDT)
For the moment I have not checklisted the disambiguation page I found (Asp (disambiguation)) awaiting decision here. I don't think disambiguation pages belong to any workgroup. We can track them by having a disambiguation category/workgroup. I agree with the idea of having article types 5, 6 and 7 as Larry suggested above. Derek Harkness 07:37, 29 April 2007 (CDT)
I agree with the previous comment. BTW, we have the Category:disambiguation. It is generated by generated by the {disambig} template. Of course the proper checklist status (5,6,7) would be a more 'systematic' solution), but the template has an additional feature: it generates a short text that informs the reader what kind of page he is looking at. The question is whether we consider this an advantage or a needless technical comment that distracts the reader from the content. I'm leaning to the latter (but still putting the template for a while to keep track). Thoughts? --Aleksander Stos 13:22, 30 April 2007 (CDT)
Just need to contradict myself. I gave more thought to 5,6,7 and think it's wrong. The article status 1 to 4 describes how the article is writen, not what sort of article it is where as 5 to 7 suggested decribe the type of article but ignore how it is writen. I could see the case where a Catalog article was also a status 3 stub or where a list was aproved status 0. We need another feild in the checklist for article type. Derek Harkness 10:42, 1 May 2007 (CDT)
Good point, that last --Larry Sanger 23:37, 30 April 2007 (CDT)

Should we be checklisting lists? I'm in the L's on the Big Cleanup but I don't know if the lists fall under this. They're technically not articles, are they? David Martin 10:50, 18 May 2007 (CDT)

Let's go ahead and checklist them. I think we'll probably be changing the checklist to track the type of page, in the future. But in the meantime, it will still be useful to have these pages tracked by workgroups. --Larry Sanger 11:10, 18 May 2007 (CDT)