calculation. This is ideal when there is very limited data
(it is enabled by default)
--enable-robinson
Enables Robinson's geometric mean test. The differences are:
- A window-size of 25 is used instead of 15
- The combination algorithm is different. See:
http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
for more information.
This algorithm is obsolete, and not recommended for production builds.
--enable-chi-square
Fisher-Robinson的 Inverse Chi-Square算法可用。
在libdspam.c中默认的是:
Defaults in libdspam.c:
- Exclusionary radius of 0.45
- Ham/Spam Cutoff of 0.5
- Strength: 0.1
- Assumed probability: 0.5
注: 您可以同时激活多种算法规则;如果某个算法认为某邮件是垃圾邮件,他会直接打上标记。自然地,您也会
发现潜在的问题,即由这些算法产生的假阳性邮件,由此,推荐或者坚持一个算法,或者只用Bayesian
或Robindon的算法。Bayesian Alt-Bayesian看起来是最有效的联合(根本不用Robinson算法)。
正是这个原因,如果您想用默认禁止的算法时,强烈推荐您同时:
字串1
--disable-traditional-bayesian --disable-alternative-bayesian
一般来说,alternative-Bayesian算法有时发现一些传统算法没有发现的垃圾邮件,但是,它相比传统算法
会遗漏更多的垃圾邮件。由此,两个Bayesian算法同时使用看来是最佳的办法。
--enable-chi-square
Enables Fisher-Robinson's Inverse Chi-Square
Defaults in libdspam.c:
- Exclusionary radius of 0.45
- Ham/Spam Cutoff of 0.5
- Strength: 0.1
- Assumed probability: 0.5
NOTE: You may have multiple algorithms enabled simultaneously; if any of
the enabled algorithms believe the message is spam, it will be marked
accordingly. Naturally, you also have the potential problem of any
false positives generated by the enabled algorithms, so it is recommended
to either stick with a single algorithm, or use only Bayesian or only
Robinson's type algorithms. Bayesian Alt-Bayesian seems to be the most
effective combination (not using Robinson's at all).
For this reason, if you plan on enabling any algorithms which are
字串9
disabled by default, it is strongly recommended that you also:
--disable-traditional-bayesian --disable-alternative-bayesian
Generally, the alternative-Bayesian algorithm appears to catch some spams
that the traditional Bayesian algorithm does not, however it also misses
far more spams than the traditional algorithm. Therefore, an
implementation using both Bayesian algorithms appears to be the most
effective in catching spam.
--disable-bias
当偏见被禁止后,dspam不再为了正常邮件而偏爱统计学,而是以平等的计算来平等的评估垃圾和正常邮件。
这或许会对垃圾过滤更有效,但是也提高了假阳性的数量。
--enable-robinson-pvalues
Robinson的联合p-valuse方法可用。这个方法和下面描述的产生单词概率可以二者择一:
http://www.linuxjournal.com/article.php?sid=6467
Robinson的p-values方法目前用于Chi-Square的计算,但是让它们带上标记就会使其用于“所有的”计算,
且有效的取代(或是依赖于)Graham的标记方法。这个标记在Chi-Square禁用时也可用。
--disable-test-conditional
字串8
禁用test-conditional训练。Test-conditional训练与传统的相比是一个更加有力的方式,更迅速的提供了
更多的inoculous结果。
默认已激活,训练的模式会自动重新训练用户的垃圾或假阳性词典,直到条件为met(例如直到用户的字典不
再对疑似邮件产生错误的分类) 。这种再训练最多可以迭代5次,当以下情况时才被调用:
-当用户有多于1000封正常邮件时,且报告有垃圾邮件
-用户正在报告有假阳性邮件(有多少邮件可不计)
--disable-bias
When bias is disabled, dspam no longer biases the statistics in favor of
innocent mail, but measures both spam and innocent tokens equally in the
calculation equally. This may provide more effective spam filtering,
but has shown to increase the number of false positives.
--enable-robinson-pvalues
Enable's robinson's technique for combining p-values. This is an alternative
approach to generating word probabilities described here:
http://www.linuxjournal.com/article.php?sid=6467
Robinson's p-values are presently used in Chi-Square calculations, but 字串2
enabling them with this flag will use them for *all* calculations effectively
replacing (or rather building upon) Graham's tokenization approach. This
flag may also be used without enabling Chi-Square.
--disable-test-conditional
Disables test-conditional training. Test-conditional training is a more
agressive approach to training than traditional training, and provides more
inoculous results rapidly.
Enabled by default, this mode of training will automatically re-train the
user's dictionary on spam or false positive until the training condition is
met (e.g. until the user's dictionary no longer results in
misclassification of the message being retrained). This training has a
maximum number of 5 iterations, and will only invoke when:
- The user has > 1000 innocent messages in their corpus, and is reporting
a spam
- The user is reporting a false positive (regardless of the number of

