Topic: Vote for your Guru of the day Pages that link to <a href="https://ozoneasylum.com/backlink?for=27916" title="Pages that link to Topic: Vote for your Guru of the day" rel="nofollow" >Topic: Vote for your Guru of the day\

 
Author Thread
_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 10:43 Edit Quote

Hello world,
I am working on a little mod-bot, which will assist in the archival of sunken threads.

It takes in account two major elements: posters and words in a thread, as well as post count.

Please don't spend the time telling me why I should not include post count or your views on how to make an AI,
for AI is a tedious and new field of research, and I don't want to waste this thread on a tantrum about why I made these choices
: I have excellent reasons for that and will not answer anything questionning the engineering approach.

The thing is being discussed in the Mad Sci forums for more technicalities.

I am preparing the Knowledge base of this bot in xml, as a tree telling who are considered good posters for a given forum,
and what are considered good/bad words.

So what I need here is the Asylum to tell me Guru names by forum: it's a difficult task, but I am asking to vote
for the people you think were most useful, Mad-Scis or not, in a given forum.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 13:25 Edit Quote

Somebody, please, do something that makes sense about this...

Wether you understand it or not, what is being developped here will make for a better threads preservation, better archival, less redundant informations, less space and bandwidth load on the Asylum server, therefore, faster access speeds, less hassle for moderators who will be able to focus on other tasks.

This is reallyreallyreally important.

And I am not asking anybody to get judgemental, I am asking for informations that will be priceless for me.

Please, everybody, let down the ego, walk in my nerdy shoes for a second, and try to see things this way: I am able to deliver
a golden enhancement to the Asylum, I need you to dare pick a few mods you like. Without you, I am not getting anywhere.

And if you really want to give me tips about how to develop, or have question, email away at brundle21 at hotmail * com, but keep this thread safe from things that are not what I am asking for.

Please. Some Mad Sci, back me up on this one, I am striving to get this thing done and to avoid posts like Hughe's (Mad Sci forum)
to spawn interferences.

Hugh
Paranoid (IV) Inmate

From: Dublin, Ireland
Insane since: Jul 2000

IP logged posted posted 05-11-2006 13:47 Edit Quote

"let down the ego"

lol

edit: also: I vote for: poi, mahjqa, Tao, dl-44, bitdamaged as those names pop to mind.

(Edited by Hugh on 05-11-2006 14:20)

Skaarjj
Maniac (V) Mad Scientist

From: :morF
Insane since: May 2000

IP logged posted posted 05-11-2006 13:59 Edit Quote

_Mauro, calm down mate. Give people a chance to read, digest and reply first, hmm? Before you go off assuming no one cares or that the world is out to get you, wait a day or so, or maybe two, for replies. I, personally, am not going to venture an opinion one way or the other on this, for now. I've got too much going on to be drawn into any kind of debate/whatever-the-hell-else. Give me a few days, then I'll say my piece.


Justice 4 Pat Richard

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 14:00 Edit Quote

Precision. Clockwork. Don't mean to be harsh, but am 150% percent focusing on getting it right. Can't explain being a nerd. Can get the job done though. Kinda like C3PO...

Blaise
Paranoid (IV) Inmate

From: London
Insane since: Jun 2003

IP logged posted posted 05-11-2006 15:11 Edit Quote

Lol.

Do you want the guru and the forum name, or just a bunch of gurus?

DHTML/Javascript => poi/Ini
CSS - DOM - XHTML - XML - XSL - XSLT => reiso/Blaise!
Stupid Basic HTML => reiso
Photoshop => Tao
Philosophy and other Silliness => jade

Hugh
Paranoid (IV) Inmate

From: Dublin, Ireland
Insane since: Jul 2000

IP logged posted posted 05-11-2006 15:15 Edit Quote

The Emporer, Synax, Slime and Bugimus incase they ever come back.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 15:17 Edit Quote

Blaise, I'd marry you. That's exactly it, e-x-a-c-t-l-y. Skaarjj, I understand, thank you, there's no hurry, just want to avoid the tantrum which gets confusing to me at all costs. Some day, I'll post something about an extreme coder's dont's, to let people understand why I sound like that at times.

hyperbole
Paranoid (IV) Inmate

From: Madison, Indiana, USA
Insane since: Aug 2000

IP logged posted posted 05-11-2006 17:40 Edit Quote

Are you asking about gurus who are active today or those from the entire history of the asylum?

.



-- not necessarily stoned... just beautiful.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 18:23 Edit Quote

Entire history, please categorize them though. Entire, entire, entire history, and even alternate or past nicknames.
Because we want to be able to treat any thread, and we want the system to recognize those good poster if/when they come back.

DL-44
Lunatic (VI) Inmate

From: under the bed
Insane since: Feb 2000

IP logged posted posted 05-11-2006 19:10 Edit Quote

Ok, a few off the top of my head:

Javascript/DHTML:
poi, bitdamaged, Slime, Kuckus, Bugimus, Max

Server Side:
Tiberius Prime, Emperor, Bitdamaged, Mauro, Max

Photoshop:
Docile Bob, Tao, Dark Garden, Synax, Lacuna, Majhqa, Steve, JKMabry, Michael

Photography:
Steve, Shiiizzzam

Philosophy:
Bugimus, Web Shaman, DL-44 ( ), Suho_1004

CSS:
resio, divine chaos

Site Reviews:
Suho_1004, Cameron, Michael

big sigs:
Suho_1004, Mahjqa, Docile Bob, Tao

Plenty that I've left out obviously, but those are the ones who sprang to mind at the moment...

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 19:12 Edit Quote

DL-44 is in the category named "global", which means the guy adds value to the thread regardless of the forum.
Btw, this is another category that matters, GLOBAL. Doc does that too, he makes sense wherever he posts.

And thank you, please keep it coming..

WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

IP logged posted posted 05-11-2006 19:36 Edit Quote

I would also add that a post by Weadah, TwiTch^, Mikey Milker, eyezaer would tend to be good threads.

Emperor would be another one which you might want to put some relvance on, but you might be careful with that one as he touched almost every thread there for a while.

If anyone has a copy of the old site's post counts those at the top with a few exceptions tended to be the ones giving a lot of positive information and threads they touched tended to be worthwhile.

Dan @ Code Town

Jestah
Maniac (V) Mad Scientist

From: Long Island, NY
Insane since: Jun 2000

IP logged posted posted 05-11-2006 19:41 Edit Quote

"Please don't spend the time telling me why I should not ..."
"don't want to waste this thread on a tantrum about why I ..."
"let down the ego"
"Wether you understand it or not ..."

My God Ini, you sure know how to ask for help. My list would be similar to DL's except I'd include Ramasax, Pugzly and the Doc.

(Edited by Jestah on 05-11-2006 19:44)

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 20:54 Edit Quote

let down the ego is not from me.

AND..

quote:

My God Ini



bites tongue ..
Call me Mauro.

Anyway, I've got something, a rough draft of the knowledge base. It may be easier to explain my point with this:
http://www.beyondwonderland.com/asylum/knowledgebase.xml

Basically, the bot is wondering about stuff like "hmmm... this poster adds value to this thread and has posted x times among y post. Besides, among A words, there are B percent positive keywords, relevant to this or
that topic. The poster tends to add more value to this kind of forum, and these keywords corresponding to this and that and that forum. In addition, there is a high post count for this thread."

With a well balanced averaging on these factors, this simple, default model can model all sorts of relationships between poster, words, corresponding forums, and the notion of "relevance".
It was a pain to explain, and a pain to get right at first. Hence the early "panick attack", but I think I got it right, so now you can crit it while adding to it, and suggest ideas.

But I think I can prove most discussion situations are covered by a model like this with an appropriate set of rules to balance the factors.

If I can't, more power to the people, and we can correct a wrong foundation. Sounds right?

DmS
Maniac (V) Inmate

From: Sthlm, Sweden
Insane since: Oct 2000

IP logged posted posted 05-11-2006 21:10 Edit Quote

interesting initiative and a cool application, I hadly understand the beginnings of AI but it's interesting & cool just the same

I'd like to add DL-44 to Photoshop, Big sigs & CSS - DOM...

Looked through your XML file there _Mauro, I don't know how you plan to match posters against posts, but if you are using nicknames you need some spell checks on them nocknames here and there

Then another Q, (yes I know it's tech but it might help in the selection of Gurus)
How much weight do you put on frequency of posts from a poster if the other criterias are filled for the post? I'm wondering basically since there are ppl that post seldom but put a lot of energy into the posts they actually write (I know that I tend to be one of them) and I don't want those posts to fall between the cracks.

Good luck with this _Mauro! It will be very interesting to see how it turns out
/D

{cell 260} {Blog}
-{" Computer:
?As a Quantum Supercomputer I take advantage of Zeno?s paradox?
C: ?Imagine a photon that must travel from A to B. The photon travels half the distance to B. From its new location it travels half the new distance. And again, it travels half that new distance.?
C: ?It continually works to get to B but it never arrives.?
Human: ?So you keep getting closer to finishing your task but never actually do??
C: ?Hey, coders make a living doing that??
?}-

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 21:34 Edit Quote

It's really the beginning of such a thing that is hard to get right, and it's hard to avoid such a thing drifting immediately into too much complexity, but from now on, all corrections
from spell checks on nocknames to <anything> can happen, I don't have a problem with any kind of input past this point.

I'll try to expand a little bit...

- a thread has a "volume", probably word count. Meaning that 100% of a thread = all words, regardless of thread post count.
- the thread post count gets used like a boolean fact of some sort: "thread contains more than x posts" yes/no, and has a low impact on the final balance. I'd say a 10th, maybe a 5th.
- keywords, bad and good, can be sentences as you can see. "thanks for sharing" typically means additional value for the thread. how much depends on the word count of the thread.

This is still to determine, when implementing the actual rules (these are only the facts of the facts/knowledge base), but the keyword added value is either general, in which case
you get a global impact on the thread worth, or associated to a forum, in which case you get a "likeliness to belong to that forum" AND added value on a global scale.

- good posters appearances, again, should be used as a fraction of the thread post count, and should add to the global value, again, as well to the "relevance to this or that forum" factor.

This is a rough draft of the next, most tedious part of the project: setting and tuning the rules.

So, to answer your question, a small thread's relevance would be treated like a big thread, one single post of a goodposter would highly impact it's value (and relevance to topics),
and since one single intense posts contains more words, it is more likely to contain keywords.

If your thread is that intense and the rules are finely balanced, it would just be kept.

....

Hell, I think that when the autobot goes live, it should first just *hint* at a classification, then, later on, classify, but leave out unclassified entries, leave them in the sink, those that do not seem relevant.
It shouldn't be doing a straightforward job of massacrating randomly, but should leverage the real mod's task by 90% by reducing the archival load appropriately
and ONLY WHEN IT IS REALLY SURE IT'S DOING A GOOD JOB.

But we should be there quite soon already.

poi
Paranoid (IV) Inmate

From: Norway
Insane since: Jun 2002

IP logged posted posted 05-11-2006 21:38 Edit Quote

thanks for the guru praises I wish I could post more often, but as of now most of the stuffs I do are covered by this NDA thing ... or part of my next JavaScript demo.

DmS: I guess the overall post frequency ( i.e: post count / Insane since ) of a 'guru' will influence the relative weight of each of his/her posts. If a 'guru' has a low post frequency, then it must be a bad ass guru and each of his/her weighs its weigh of black pills.

_Mauro: How do you know what are the positive keywords, relevant to this or that topic ?

DL-44
Lunatic (VI) Inmate

From: under the bed
Insane since: Feb 2000

IP logged posted posted 05-11-2006 21:42 Edit Quote

D'oh! How could I forget DMS for server side!?
After all I've learned about PHP from his posts and Tutorials than any other source.

DmS
Maniac (V) Inmate

From: Sthlm, Sweden
Insane since: Oct 2000

IP logged posted posted 05-11-2006 22:01 Edit Quote

Ok _Mauro, that clears things up a bit for me.
As you say, as long as all the important parameters are found initilally, the weight and priority of each can be adjusted at a later point Adding parameters afterwards is way more complicated since every new one basically resets all the properties for the existing ones.
It can quite quickly become very complex to tune all the rules though if there are too many parameters.
I've been slightly involved with fraud detection in online gaming and it's hell to solidly detect patterns in large volumes adn many parameters involved, it can be done, and it is being done, but it's hard to be sure. Cannot share any of it though, propreitary code.

And for starting out with hinting a thread during training, all 10 thumbs up!
That will give a very good indication on how it works.

I'll back out of this now and let the thread go back to what it's for.
Selecting posters


And here's another Guru:
Steve for Multimedia/flash & photoshop (perhaps not frequent but darn good)

And thanx DL it's really appreciated. I like to share
/D

{cell 260} {Blog}
-{" Computer:
?As a Quantum Supercomputer I take advantage of Zeno?s paradox?
C: ?Imagine a photon that must travel from A to B. The photon travels half the distance to B. From its new location it travels half the new distance. And again, it travels half that new distance.?
C: ?It continually works to get to B but it never arrives.?
Human: ?So you keep getting closer to finishing your task but never actually do??
C: ?Hey, coders make a living doing that??
?}-

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 22:13 Edit Quote
quote:

Mauro: How do you know what are the positive keywords, relevant to this or that topic ?



They're not words only, but motos, expressions, commonplaces relevant to the topic. I have an update btw, which will help me explain, and hopefully live up to the challenge:
http://www.beyondwonderland.com/asylum/knowledgebase.xml

This is really just a matter of tuning, but everybody should help me tune. For many reasons.
For one, it's hard to give "ratings per forum" to gurus, I mean: who am I to judge all alone?

But we have to be honest to the machine, and after all, this pseudo-"mark" just means "relevance of poster to a topic". This is the crux, sadly, oddly, funnily.

Then, for instance:

code:
<forum name="Photoshop" goodposters="docilebob, Tao, DarkGarden, Synax, Lacuna, Majhqa, Steve, JKMabry, Michael" goodkeywords="howto, tutorial" badkeywords=""/>
<forum name="Photoshop pong" goodposters="" goodkeywords="" badkeywords=""/>
<forum name="Big Sig" goodposters="Jung, Majhqa, Tao" goodkeywords="testing, howto, tutorial" badkeywords=""/>



While two forums share similar keywords, they don't share the same good posters, and they don't share ALL goodkeywords, so the relevance will fall on one side or another.
Some ambiguities will appear, and this leads us to...

The crunchy bit.


When those ambiguous moments come, the ES must ask a human being for an additional "fact" that will help make things fall in one category or the other.
For example, and additional keyword.

OR.

An additional key poster.

Which reduces the failure rate as the facts base grows until it really tends towards 0.

Hence the three major stages in developping such an app.

So we should really fill this up according to this structure by expressing our "own private chart".
Posters, in my xml, who are not associated to a specific forum are global posters.
Doc O is the only one who has a "permanent" rating of 5.0. Not sure this is honest, but in the spirit of teaching the AI this is Doc O's place, and due to the fact most of the work
here has been enhanced by the ideas spread by the good Doc, I think it is.

Another nice game: finding the right "key expressions", "motos", that reflect the spirit of this or that forum, and the relevance or irrelevance of threads.

Oh, and I'll take in account your suggestions, but should sum them up for voting at some point, as I really am not able to judge everything here at a glance,
and it is not my duty on this project.

One good news though: the more we throw good posters in, with an accurate rating for relevance, the more accurate the ES will grow because it will have more facts to study.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-11-2006 23:34 Edit Quote

Ok, I am starting to really like how things turn out.
It still is xml only, but now it is stuffed with "keywords"... which only matter in terms of relevance.

So, to spot them, I just browse through old threads from the sink and highlight wathever sounds.. relevant to a general topic (DHTML, Photoshop, etc.. forums).

http://www.beyondwonderland.com/asylum/knowledgebase.xml

And I have an example of a small thread that can be used to test this system..
www.ozoneasylum.com/27573

Out of 32 meaningful words, there are 4 "keyphrases" (1/8 of the whole wordcount), with a relevance to the Ozone forum and an overall relevance.
The ES would, in it's infancy, catch this as good and mark it as archivable under Ozone.

We could then adjust it's treshold.
Etc..

WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

IP logged posted posted 05-12-2006 04:53 Edit Quote

Looks like you have some gooposters in there, and I believe the spelling of twiTch^ is off.

Dan @ Code Town

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-12-2006 13:08 Edit Quote

Removed the gooposters and corrected the twitch, well spotted, thanks.
Added this, which is inactive for the moment:
http://www.beyondwonderland.com/asylum/kb.io.inc.php

The plan is to let people directly edit this. Only the listed posters... I'll soon post an url where you can register, either for a mailbox @beyondwonderland.com or with your own mailbox, to be a knowledge base reviewer.

Then people who care to edit this will produce "temp" iterations of the xml document, and they will receive a copy, as well as me.

All temporary iterations will made be public for everyone to be able to re-read them.

From then on, once the KB is full enough, the "real deal" can start and the bot can get live.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-12-2006 15:35 Edit Quote

Ok, from now on,
when you reach:
http://www.beyondwonderland.com/asylum/

You are prompted for the following password / username:
guest
expertsystems

And you can then subscribe to be a reviewer of the automod system, giving you access to all the updates,
and options to access and edit the data contained in the knowledge base.
Reviewers who are part of the "good posters" list will be the only ones accepted during the first weeks of usage of this system.

Later on, when versionning is stable and a workflow / framework have been tested toroughly, all subscriptions
will be processed and everybody will have access as a potential reviewer.

When the AI goes live, it will also ask good posters questions relevant to theyre forums of expertise, to improve it's knwoledge.

The accounts won't be enabled immediately though, as lots of things are still in the works behind the scenes, a day or two are required for me
to setup the whole set of options (kb edition with versionning, kb drafts publication, etc.).

hyperbole
Paranoid (IV) Inmate

From: Madison, Indiana, USA
Insane since: Aug 2000

IP logged posted posted 05-12-2006 21:32 Edit Quote

General: DocOzone, NoJive, docilebob, Suho1004
DHTML/Javascript: DocOzone, poi, Slime, WarMage
Server Side: Slime, DmS, Tiberius Prime, silence, Skaarjj
CSS,DOM,etc.: poi, JKMabry, Mr Max, mas
HTML: DocOzone, JKMabry
Outpatient: JKMabry
Photoshop: DarkGarden, Vogon Poet, Weadah, Steve, mikey milker, Mahjqa, Suho1004, F1_error, docilebob, JKMabry, warjournal, Skaarjj
Photography: Shiiizzzam, Steve,
Print: Steve, jstuartj
Philosophy: NoJive, Tao, Suho1004, Skaarjj, docilebob
Site Reviews, Dark Garden, mikey milker, DL-44

.



-- not necessarily stoned... just beautiful.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-13-2006 00:54 Edit Quote

The "guest / expertsystems" thing is temporary.

It prevents some potential issues during development.
I have made a couple of things. A "keyword evaluator" of some sort. Of course, it detects words with some flexibility. It also does that fast.
Look: http://www.beyondwonderland.com/asylum/inference.engine.php?word=code

And there is the knowledge base interface, which now looks quite cool.
http://www.beyondwonderland.com/asylum/kb.edit.php

Versionning will allow keeping and comparing and merging and publishing a lot of versions of the kb, while using only one.
And it needs other reviewers than me, otherwise, the AI will learn it's basics from me - only - which is bad in any event.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-13-2006 14:48 Edit Quote

There still are tons of things to do, but it seems to be working pretty well.
Here, I am at the earliest stage of implementation, still.

I am teaching the thing how to recognize dhtml threads using keywords only.
Over 50 sink pages more or less, it recognizes dhtml threads fairly well.

Only the "relevance" and a link to the thread are mentionned.

See for yourself:
http://www.beyondwonderland.com/asylum/inference.engine.php

( username: guest / pass: expertsystems )

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-13-2006 19:09 Edit Quote

Ok, last update for today, results. Easy and fast to access this time. Removed the password protection.

It looks like:
http://www.beyondwonderland.com/asylum/ozone.html
http://www.beyondwonderland.com/asylum/dhtml.html
http://www.beyondwonderland.com/asylum/sside.html
http://www.beyondwonderland.com/asylum/photoshop.html

It already is able to detect threads for these topics based on keywords. Good posters and bad keywords should definitely result in the expected result, eg. a good filter.
80-90% of posts archived at each shot.

Still, it is quite slow for the moment. It does a whole lot of computations, 10-20 seconds for 76 pages against one forum's keywords, when run as a batch script (what I do here is run it as a batch, and cat the output to an html file). Could improve this by using other sorting / filtering methods maybe,
or a finer memory management. Dunno. Can surely spare 20% more percent of execution time easilly.

All in all, it should be made for 100 pages shots. It would take 5 minutes to safely archive 80-90 threads out of 100. Run it twice a day, and the sink is empty within a couple of weeks.
Of course, the knwoledge base has to be enhanced AND automated: when the bot doubts, as I said, it should ask.

Hugh
Paranoid (IV) Inmate

From: Dublin, Ireland
Insane since: Jul 2000

IP logged posted posted 05-13-2006 19:22 Edit Quote

Its hard to tell, are these threads to be kept or deleted?

Hugh
Paranoid (IV) Inmate

From: Dublin, Ireland
Insane since: Jul 2000

IP logged posted posted 05-13-2006 19:23 Edit Quote

Interference++

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-13-2006 19:27 Edit Quote

Well, assuming it is a serious question, a recommended thread is recommended for archival in the corresponding forum. >= 50% relevance to that forum.
For the moment, this is only the keyword search. Averaging this with good posters count * rating per forum will make it even more accurate.

The post count will count as a small bonus, 10% max.

These are the keywords:

code:
$dhtml = array(1 => "function", 2 => "dhtml", 3 => "jscript", 6 => "javascript", 7 => "ajax", 0 => "js", 5 => "div", 6 => "onmouseover", 7 => "onmouseout", 8 => "demo" ); // DHTML
	$photoshop = array(0 => "photoshop", 3 => "painting", 4 => "brush", 5 => "filter" ); // Photoshop
	$sside = array(0 => "php", 3 => "python", 4 => "perl", 5 => "sql", 6 => "c\+\+", 7 => "request", 8 => "message", 9 => "asp", 10 => "response", 11 => "database", 12 => "server", 13 => "apache", 14 => "java" ); // Server-side
	$ozone = array(1 => "video", 4 => "shocking", 6 => "pill", 8 => "birthday", 10 => "tv", 12 => "zeldman", 13 => "wife", 14 => "newborn", 15 => "tattoo", 16 => "companies", 15 => "congratulations", 16 => "hilarious", 17 => "mac", 18 => "joy", 19 => "ozone", 20 => "free"); // Ozone	
	$bigsig = array( 1 => "testing", 3 => "runner up");



And another sweet enhancement would be adding multiple words for keywords. Currently, they're processed as single words.

DL-44
Lunatic (VI) Inmate

From: under the bed
Insane since: Feb 2000

IP logged posted posted 05-14-2006 19:07 Edit Quote

It seems this doesn't take into account the original forum it was posted in. I think that will be a key piece of information for archiving.

I only took a quck browse through, but there seems to be a lot of overlap (same thread recommended for archival in multiple archives), and a lot that are recommended for archival in inappropriate areas (a POSER thread presumably from the 3D forum in the photoshop archive, several server-side questions in the Ozone, a server-side in the Photoshop, etc).

Forgive me if this has been addressed already, I've only skimmed the latest threads on thsi subject - just an observation.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-14-2006 20:22 Edit Quote

Thank you for bringing this up.
The original forum was not shown in the sink thread, so I could not analyse it. Anyway, I think some threads from this or that forum are relevant to another forum at times, but...
You're right in that human mods do move threads to the right place, so having and using this info would make things a lot easier.

This information belongs to TP for the time being, and I conceived my thing without taking it in account because as the sink stands, it's not available.

The rating overlap is normal, since some threads are relevant to different topics at the same time... server-side threads are relevant to coding, but not to dhtml.
So basically, for the moment, the engine guess where it should put relevant threads based on the highest rating.

Therefore, threads that overlap belong more to one forum than another.
And only scores higher than 50% are meaningful to the engine.

When you see things in that light, and you know the engine would only process according to the highest rating, and only ratings above 50%, it makes much more sense and seems to get it right
everytime.

Give it a second check with these info and let me know... (not that your advice is not welcome: it's actually the kind of fine tuning helper pointers I need from now on - so really, let me know)

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-14-2006 20:44 Edit Quote

Attempt at laying out the rules:
A = global goodkeywords rate per page
B = global badkeywords rate per page
C = global goodposters rate per page
D = forum specific goodkeywords rate per page
E = forum specific badkeywords rate per page
F = forum specific goodposters rate per page
G = Archivability rate = ((A-B) + C) / 2 + 10 IF I is true
H = Relevance to a specific forum rate = (((D-E) + F) / 2)
I = Postcount >= 30
J = If page archivability >= 50 archive thread in archive defined by max(H)

Yeaaah... that's about it. These aren't the comprehensive details of the implementation, these are the rules that match the knowledge base definition to fullfil the job.
Right now, only rule D is executed.

In the end, rule J will be called on all pages, and will recursively evaluate all the rest.
In real world, it just works. I mean, if only getting D already tells me where most posts belong, having the whole picture and good facts base will do a hell of a job.

Hard to explain in simple words though...

DL-44
Lunatic (VI) Inmate

From: under the bed
Insane since: Feb 2000

IP logged posted posted 05-15-2006 01:32 Edit Quote
quote:

_Mauro said:

When you see things in that light, and you know the engine would only process according to the highest rating, and only ratings above 50%, it makes much more sense and seems to get it right
everytime.

Give it a second check with these info and let me know... (not that your advice is not welcome: it's actually the kind of fine tuning helper pointers I need from now on - so really, let me know)


Certainly, as a concept, this makes sense. With the particular threads that I did look at, it just didn't fit though. I didn't take specific info from them, but basically there were a hanful of very decidedly server-side coding oriented posts I looked at (about 5 that I clicked on) that were set to be archived in the Ozone forum. While surely almost any post *could* fit in the Ozone forum, these should definitely have been in the Server Side. There was one that I clicked on that was a server-side coding question, set to archive in the Photoshop section.

Obviously this would be more helpful had I logged the threads for you to look at - will keep that in mind in the future

Now, as for the forum of origin - I just went back to check, and sure enough I cannot find that info. I seem to recall, however, that on hover, the forum of origin used to pop-up (via the title attribute), but now all it says is "from the sink".
Is this something that has changed, or is my memory faulty?

Having the original forum is invaluable, whether the system is human run or system run...


I think this is a very interesting concept overall, and very worthy of following. The human method, while more trustworthy in some aspects, is clearly not working all that great at the moment. In addition to being very subjective, it relies on people being available to do the work when required.

With 267 pages worth of threads in the sink, clearly the people have not been available

mas
Maniac (V) Mad Librarian

From: the space between us
Insane since: Sep 2002

IP logged posted posted 05-15-2006 13:13 Edit Quote
quote:
With 267 pages worth of threads in the sink, clearly the people have not been available


well, i did a lot of "sinking" but it was nearly impossible to do it so often so that the sink would have become smaller. skip 2 days and nearly a whole new page will be waiting for you. we would have needed more "humans"....MANY more...and these have not been available, as you mentioned it correctly

The Space Between Us | My Blog: lukas.grumet.at

NoJive
Maniac (V) Inmate

From: The Land of one Headlight on.
Insane since: May 2001

IP logged posted posted 05-15-2006 17:11 Edit Quote

Probably way off base with this thought... but here goes anyway.

Under 'goodkeywords' how about something like "Please archive this" or some similar phrase.

For example let's say I'm following a thread in 'Server-Side Scripting - Oh my!' and I'm finding it pretty informative and want to make sure it gets archived.

Could I simply post "Please archive this" and then when the "mod-bot"
goes about its business it knows for sure to archive that particular thread?

Workable? Probably not but thought I'd throw it out anyway.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-15-2006 23:26 Edit Quote

Well, such a system already exists, I mean, preservation words "can" be added, and this has been here for a long time.
The fact is: people don't use it as much as they should.

I am using good keywords to really evaluate the relevance to a forum, actually, the whole concept took in account the fact a thread can be posted in a given forum
but relevant to another (if I post a thread in dhtml regarding Java, it would be better in the s-side archive).

So yes, your idea could work, but in reality, it didn't. Probably didn't catch up.

My proposal has this advantage: once it is finely tuned, it does 90% of the archival alone and when it finds threads with an ambiguous classification, asks for more keywords or more info
about a poster to be able to better classify in the future.

This would leave the following tasks to real world mods: moderating, eg. closing threads and moving relevant threads to the archive where they belong.
Furthermore, the bot would "actively" ask for the tiny-winy bit of advice it needs when it needs it, and only then.

Of course, if we consider threads already "are" in the forum where they should be archived, my duty becomes a lot smaller, but I think it would be wrong... the more I think about it,
the less it makes sense: why not automatically archiving EVERYTHING to the relevant archive then? Because some threads ARE ambiguous and not everything is interesting.

Better make the engine really "guess" the relevance instead of forcing it a bit.
------------------------------------

The goal here is making it clever enough for it to leverage 90% of the human mods duty, and it can be done if we are sure it only archives what clearly belongs to one archive or another.


Spotting threads that are poorly classified and reporting to me is a great thing to do: they "show" they keywords I need to balance in plain english.

For instance, I have found that the keyword "new" is very good for identyfying threads worthy of archival, but terrible when it comes to identyfying the forum
of a thread, because we have "new" anything everyday.
Useless keywords, poorly chosen, also tend to reduce the whole relevance score of a given forum: spotting them immediately improves relevance selection for a forum.

So, three things:
1) I really think the mod-bot should be as intelligent as possible, as opposed to systematic, it should "guess", and the information of originating forum could be useless, or too much of a constraint.
2) A sticky label saying "archive this" would not do the job, as it still would require human mods to select threads to archive, still.
3) Please report threads that don't seem well classified in the above links.

And feel free to question what already exists: since I have something in my hands, it's easy to make and quickly test assumptions and hypothesis, now.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-16-2006 01:13 Edit Quote

Quick stats regarding the current version. I think I can double the current speed of the app, but...
- atm, 1/3rd of a second to process one thread against a forum keywords
- an overall high score when processing the whole set of 70 sample threads against all forums, almost everything gets associated to a forum, and many things are relevant.
- same time required to process all threads against posters

12 archives vs 17 forums.... so originating forum is not relevant, DL, mods already "should" make choices regardless of the origin of a thread.

And I can now make a rough estimate of the global processing time for the whole sink as it stands. Funny, if nothing else:
(267 * 50 * 1/3 * 12) / 3600 = 14.83~ hours to run through the whole sink.

(would be better running it more regularly on chunks of 5 sink pages for instance, would only take 30 minutes - 250 threads or so, 220 would be processed - twice a day means 440 threads archived a day).

As a single run takes some time, I will seldomly give html updates from now on, only when they are really interesting and bring something brand new.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-21-2006 17:19 Edit Quote

4 minutes and something to process 100 thread, identified something like 34 threads.
I am currently calibrating the "posters" part which, when added, should add 20 threads to the score, more or less.

http://www.beyondwonderland.com/asylum/output.html

The keyword set is here, for now. It's hard to balance right, one "wrong" keyword can get the engine to miss loads of things.

code:
$goodkeywords = array(
	"DHTML/Javascript Archives" => array("function","dhtml","jscript","javascript","js","onmouseover","onmouseout", "css", "xhtml", "domain", "onload", "preload" ), 
	"Photoshop Archives" => array("photoshop", "painting", "brush", "filter", "displace", "color", "blend", "gradient", "watercolour", "tool", "illustrator", "adobe" ), 
	"Server-Side Scripting Archives" => array("php", "echo", "perl", "sql", "c\+\+", "request", "asp", "response", "redirect", "database", "server", "apache", "java", "upload", "script", "mysql", "page", "phpbb" ), 
	"OZONE Archives" => array("ms","opera","firefox","linux", "shocking", "birthday", "zeldman", "newborn", "hilarious", "party", "ozone", "baby", "funny", "cool"), 
	"BIG SIG Archives" => array("testing", "runner up", "winner", "idea", "contest", "mask"),
	"3D Modelling & Rendering Archives" => array("mesh", "max", "bryce", "nuurbs", "opengl", "cube", "solid", "poser", "volume", "coordinate", "render", "rendering"),
	"PS Pong Competition - Archived Games!" => array("bmp", "vs", "psd", "zip", "smash", "volley", "pop-corn"),
	"Print Graphics Archives" => array("layout", "cmyk", "paper", "font", "point", "inch", "truetype"),
	"Philosophy Archives" => array("debate", "life", "death", "god", "sense", "religion", "war", "meaning", "opinion", "christian", "terrorism", "evolution", "family", "belief", "atheism")
	);



_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-22-2006 17:01 Edit Quote

Ok, the mod-bot will get live really really soon: details of it's integration to the Asylum are being discussed in the mad-sci forum.
This said, at the moment, it is very good at spotting threads from the s-side and dhtml forums, just like... me.

Because it has been built and educated by... me As I said, as an AI, it has a point of view, and this point of view should never stop evolving.
So, the system is built to "know" when it has difficulties classifying threads, but it should be able to ask someone what to do
with those threads, and why.

And to be really accurate, it should receive input from various Asylum users, not just one.

Ideally, mods, and ideally, those who are listed as posters in the Knowledge base.

So, this needs human beings and needs to be able to contact those human beings.
SMS, email, IM, your pick, but please, if you would like to help it learn and progress in the future, let me know.

Either drop me an email @mauro@beyondwonderlandDOTc-o_m or a post right here.

Cheers.

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-26-2006 14:13 Edit Quote

W00t. 72% threads fetched. Average 2"6 per thread.
And the whole processing is in place with the exception of bad keywords.

http://www.beyondwonderland.com/structure/services/automod/asylum/output.html

I would like to reach 85% threads fetched at each shot.
2"6 per thread is alright but can be improved.

A few threads are not archived correctly (spotted maybe 4 of them), but in all these cases, the second choice is the right one (meaning the kb just needs a finer balance for some cases).
Will soon release the associated web service, but first I have to improve / optimize this foundation.

(it needs to get past a certain performance treshold to really get live, this will ensure it improves itself over time..)

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 06-08-2006 19:26 Edit Quote

*pop*.

A couple more open questions, it's fine tuning time <insert sly smile here>

Can you think of "Asylum events" and help me list them? Photoshop pong, twenty liners, but also big sig, repeat performance and?...

Also, one forum I had never really paid attention to is the Photography forum: it's full of beautiful work!!!
Where should I teach automod to classify these, since there is no Photography archive? Miscellany or Photoshop or?...

And for that forum, I'll add the concerned mods to the "good posters" list: Shiiizam, krets... who am I leaving out?

Thanks in advance.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 06-08-2006 20:49 Edit Quote

photograpyh: go by 'still valid image links' - and there'll be an archive for each forum, I guess.


Events:
PSPong - again, check for matches where the pictuers are still available.
signature contest
repeat performance
the photography challenges
don't forget those 'we are telling a big story, post by post' threads that pop up around christmas
birthday congratulations
20 liners

that's all out of the top of my mind right now...



Post Reply
 
Your User Name:
Your Password:
Login Options:
 
Your Text:
Loading...
Options:


« BackwardsOnwards »

Show Forum Drop Down Menu