FuzzyOcr erkennt Spam nicht

debianfan
Posts: 164
Joined: 2002-08-17 18:40

FuzzyOcr erkennt Spam nicht

Post by debianfan »

Hallo,

FuzzyOcr läuft - teilweise.

Wenn ich das Bild der Testmails (welche mitgeliefert werden) an mich selbst
sende - nix.

Aber wenn ich z.B. aus der png-Datei eine .jpg-Datei mache und diese
versende - dann

X-Spam-Status: No, hits=3.4 required=4.0 tests=AWL,BAYES_00,
FUZZY_OCR_WRONG_CTYPE,FUZZY_OCR_WRONG_EXTENSION,HTML_40_50,HTML_MESSAGE,
RCVD_IN_SORBS_DUL,SHORT_HELO_AND_INLINE_IMAGE autolearn=no version=3.1.8

Er erkennt nur, das das Dateiformat falsch ist - also ich versucht habe, ihm
eine png als jpg-Datei unterzujubeln.

Allerdings verliert er über die Viagra-Werbung in der Mail kein Wort :-(

Wenn ich direkt die Mail (ich verwende Courier) über

spamassassin --debug FuzzyOcr < meinemailineinerdatei /dev/null

anspreche, dann erkennt er den Spam.

Jetzt bin ich etwas ratlos.

gruß

Sebastian
Top

debianfan
Posts: 164
Joined: 2002-08-17 18:40

Re: FuzzyOcr erkennt Spam nicht

Post by debianfan »

im Debug Modus gibt er noch folgendes aus

spamassassin --debug FuzzyOcr < msg.fE4U:2,S
Subroutine FuzzyOcr::O_CREAT redefined at /usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
Subroutine FuzzyOcr::O_EXCL redefined at /usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
Subroutine FuzzyOcr::O_RDWR redefined at /usr/local/lib/perl5/5.8.8/Exporter.pm line 65.
at /usr/local/lib/perl5/5.8.8/i686-linux/POSIX.pm line 19
[31742] dbg: FuzzyOcr: focr_bin_helper: 'pnmnorm,pnminvert,convert,ppmtopgm,tesseract'
[31742] info: FuzzyOcr: Adding <5> new helper apps
[31742] info: FuzzyOcr: Starting preprocessor parser for file "/etc/mail/spamassassin/FuzzyOcr.preps"...
[31742] dbg: FuzzyOcr: line: preprocessor normalize {
[31742] dbg: FuzzyOcr: line: command = pnmnorm
[31742] dbg: FuzzyOcr: line: }
[31742] dbg: FuzzyOcr: line: preprocessor invert {
[31742] dbg: FuzzyOcr: line: command = pnminvert
[31742] dbg: FuzzyOcr: line: }
[31742] dbg: FuzzyOcr: line: preprocessor ppmtopgm {
[31742] dbg: FuzzyOcr: line: command = ppmtopgm
[31742] dbg: FuzzyOcr: line: }
[31742] dbg: FuzzyOcr: line: preprocessor pamtopnm {
[31742] dbg: FuzzyOcr: line: command = pamtopnm
[31742] dbg: FuzzyOcr: line: }
[31742] dbg: FuzzyOcr: line: preprocessor pamthreshold {
[31742] dbg: FuzzyOcr: line: command = pamthreshold
[31742] dbg: FuzzyOcr: line: args = -simple -threshold 0.5
[31742] dbg: FuzzyOcr: line: }
[31742] dbg: FuzzyOcr: line: preprocessor maketiff {
[31742] dbg: FuzzyOcr: line: command = pnmtotiff
[31742] dbg: FuzzyOcr: line: args = -color -truecolor
[31742] dbg: FuzzyOcr: line: }
[31742] info: FuzzyOcr: Starting scanset parser for file "/etc/mail/spamassassin/FuzzyOcr.scansets"...
[31742] dbg: FuzzyOcr: line scanset ocrad {
[31742] dbg: FuzzyOcr: line command = $ocrad
[31742] dbg: FuzzyOcr: line args = -s5 $input
[31742] dbg: FuzzyOcr: line }
[31742] dbg: FuzzyOcr: line scanset ocrad-invert {
[31742] dbg: FuzzyOcr: line command = $ocrad
[31742] dbg: FuzzyOcr: line args = -s5 -i $input
[31742] dbg: FuzzyOcr: line }
[31742] dbg: FuzzyOcr: line scanset ocrad-decolorize-invert {
[31742] dbg: FuzzyOcr: line preprocessors = ppmtopgm, pamthreshold, pamtopnm
[31742] dbg: FuzzyOcr: line command = $ocrad
[31742] dbg: FuzzyOcr: line args = -s5 -i $input
[31742] dbg: FuzzyOcr: line }
[31742] dbg: FuzzyOcr: line scanset ocrad-decolorize {
[31742] dbg: FuzzyOcr: line preprocessors = ppmtopgm, pamthreshold, pamtopnm
[31742] dbg: FuzzyOcr: line command = $ocrad
[31742] dbg: FuzzyOcr: line args = -s5 $input
[31742] dbg: FuzzyOcr: line }
[31742] dbg: FuzzyOcr: line scanset gocr {
[31742] dbg: FuzzyOcr: line command = $gocr
[31742] dbg: FuzzyOcr: line args = -i $input
[31742] dbg: FuzzyOcr: line }
[31742] dbg: FuzzyOcr: line scanset gocr-180 {
[31742] dbg: FuzzyOcr: line command = $gocr
[31742] dbg: FuzzyOcr: line args = -l 180 -d 2 -i $input
[31742] dbg: FuzzyOcr: line }
[31742] info: FuzzyOcr: Searching in: /usr/local/netpbm/bin
[31742] info: FuzzyOcr: Searching in: /usr/local/bin
[31742] info: FuzzyOcr: Searching in: /usr/bin
[31742] info: FuzzyOcr: Using gifsicle => /usr/local/bin/gifsicle
[31742] info: FuzzyOcr: Using giffix => /usr/bin/giffix
[31742] info: FuzzyOcr: Using giftext => /usr/bin/giftext
[31742] info: FuzzyOcr: Using gifinter => /usr/bin/gifinter
[31742] info: FuzzyOcr: Using giftopnm => /usr/bin/giftopnm
[31742] info: FuzzyOcr: Using jpegtopnm => /usr/bin/jpegtopnm
[31742] info: FuzzyOcr: Using pngtopnm => /usr/bin/pngtopnm
[31742] info: FuzzyOcr: Using bmptopnm => /usr/bin/bmptopnm
[31742] info: FuzzyOcr: Using tifftopnm => /usr/bin/tifftopnm
[31742] info: FuzzyOcr: Using ppmhist => /usr/bin/ppmhist
[31742] info: FuzzyOcr: Using pamfile => /usr/bin/pamfile
[31742] info: FuzzyOcr: Using ocrad => /usr/local/bin/ocrad
[31742] info: FuzzyOcr: Using gocr => /usr/bin/gocr
[31742] info: FuzzyOcr: Using pnmnorm => /usr/bin/pnmnorm
[31742] info: FuzzyOcr: Using pnminvert => /usr/bin/pnminvert
[31742] info: FuzzyOcr: Using convert => /usr/bin/convert
[31742] info: FuzzyOcr: Using ppmtopgm => /usr/bin/ppmtopgm
[31742] info: FuzzyOcr: Using tesseract => /usr/local/bin/tesseract
[31742] dbg: FuzzyOcr: Threshold[max_hash] => 5
[31742] dbg: FuzzyOcr: Threshold[c] => 5
[31742] dbg: FuzzyOcr: Threshold[s] => 0.01
[31742] dbg: FuzzyOcr: Threshold[w] => 0.01
[31742] dbg: FuzzyOcr: Threshold[h] => 0.01
[31742] dbg: FuzzyOcr: Threshold[cn] => 0.01
[31742] dbg: FuzzyOcr: focr_add_score => 1
[31742] dbg: FuzzyOcr: focr_autodisable_negative_score => -5
[31742] dbg: FuzzyOcr: focr_autodisable_score => 1000
[31742] dbg: FuzzyOcr: focr_autosort_buffer => 10
[31742] dbg: FuzzyOcr: focr_autosort_scanset => 1
[31742] dbg: FuzzyOcr: focr_base_score => 5
[31742] dbg: FuzzyOcr: focr_corrupt_score => 2.5
[31742] dbg: FuzzyOcr: focr_corrupt_unfixable_score => 5
[31742] dbg: FuzzyOcr: focr_counts_required => 2
[31742] dbg: FuzzyOcr: focr_db_hash => /etc/mail/spamassassin/FuzzyOcr.db
[31742] dbg: FuzzyOcr: focr_db_max_days => 35
[31742] dbg: FuzzyOcr: focr_db_safe => /etc/mail/spamassassin/FuzzyOcr.safe.db
[31742] dbg: FuzzyOcr: focr_digest_db => /etc/mail/spamassassin/FuzzyOcr.hashdb
[31742] dbg: FuzzyOcr: focr_enable_image_hashing => 2
[31742] dbg: FuzzyOcr: focr_global_timeout => 0
[31742] dbg: FuzzyOcr: focr_global_wordlist => /etc/mail/spamassassin/FuzzyOcr.words
[31742] dbg: FuzzyOcr: focr_hashing_learn_scanned => 1
[31742] dbg: FuzzyOcr: focr_keep_bad_images => 0
[31742] dbg: FuzzyOcr: focr_log_pmsinfo => 1
[31742] dbg: FuzzyOcr: focr_log_stderr => 1
[31742] dbg: FuzzyOcr: focr_logfile => /home/fuzzyocr.log
[31742] dbg: FuzzyOcr: focr_max_height => 800
[31742] dbg: FuzzyOcr: focr_max_width => 800
[31742] dbg: FuzzyOcr: focr_min_height => 4
[31742] dbg: FuzzyOcr: focr_min_width => 4
[31742] dbg: FuzzyOcr: focr_minimal_scanset => 1
[31742] dbg: FuzzyOcr: focr_mysql_db => FuzzyOcr
[31742] dbg: FuzzyOcr: focr_mysql_hash => Hash
[31742] dbg: FuzzyOcr: focr_mysql_host => localhost
[31742] dbg: FuzzyOcr: focr_mysql_port => 3306
[31742] dbg: FuzzyOcr: focr_mysql_safe => Safe
[31742] dbg: FuzzyOcr: focr_mysql_update_hash => 0
[31742] dbg: FuzzyOcr: focr_mysql_user => fuzzyocr
[31742] dbg: FuzzyOcr: focr_no_homedirs => 0
[31742] dbg: FuzzyOcr: focr_path_bin => /usr/local/netpbm/bin:/usr/local/bin:/usr/bin
[31742] dbg: FuzzyOcr: focr_personal_wordlist => __userstate__/FuzzyOcr.words
[31742] dbg: FuzzyOcr: focr_preprocessor_file => /etc/mail/spamassassin/FuzzyOcr.preps
[31742] dbg: FuzzyOcr: focr_scanset_file => /etc/mail/spamassassin/FuzzyOcr.scansets
[31742] dbg: FuzzyOcr: focr_score_ham => 0
[31742] dbg: FuzzyOcr: focr_skip_bmp => 0
[31742] dbg: FuzzyOcr: focr_skip_gif => 0
[31742] dbg: FuzzyOcr: focr_skip_jpeg => 0
[31742] dbg: FuzzyOcr: focr_skip_png => 0
[31742] dbg: FuzzyOcr: focr_skip_tiff => 0
[31742] dbg: FuzzyOcr: focr_skip_updates => 0
[31742] dbg: FuzzyOcr: focr_strip_numbers => 1
[31742] dbg: FuzzyOcr: focr_threshold => 0.25
[31742] dbg: FuzzyOcr: focr_timeout => 10
[31742] dbg: FuzzyOcr: focr_twopass_scoring_factor => 1.5
[31742] dbg: FuzzyOcr: focr_unique_matches => 0
[31742] dbg: FuzzyOcr: focr_verbose => 1
[31742] dbg: FuzzyOcr: focr_wrongctype_score => 1.5
[31742] dbg: FuzzyOcr: focr_wrongext_score => 1.5
[31742] info: FuzzyOcr: Loaded preprocessor normalize: /usr/bin/pnmnorm
[31742] info: FuzzyOcr: Loaded preprocessor invert: /usr/bin/pnminvert
[31742] info: FuzzyOcr: Loaded preprocessor ppmtopgm: /usr/bin/ppmtopgm
[31742] info: FuzzyOcr: Loaded preprocessor pamtopnm: pamtopnm
[31742] info: FuzzyOcr: Loaded preprocessor pamthreshold: pamthreshold -simple -threshold 0.5
[31742] info: FuzzyOcr: Loaded preprocessor maketiff: pnmtotiff -color -truecolor
[31742] info: FuzzyOcr: Using scan ocrad: /usr/local/bin/ocrad -s5 $input
[31742] info: FuzzyOcr: Using scan ocrad-invert: /usr/local/bin/ocrad -s5 -i $input
[31742] info: FuzzyOcr: Using scan ocrad-decolorize-invert: /usr/local/bin/ocrad -s5 -i $input
[31742] info: FuzzyOcr: Using scan ocrad-decolorize: /usr/local/bin/ocrad -s5 $input
[31742] info: FuzzyOcr: Using scan gocr: /usr/bin/gocr -i $input
[31742] info: FuzzyOcr: Using scan gocr-180: /usr/bin/gocr -l 180 -d 2 -i $input
[31742] info: FuzzyOcr: Added <43> words from "/etc/mail/spamassassin/FuzzyOcr.words"
[31742] info: rules: meta test DIGEST_MULTIPLE has undefined dependency 'DCC_CHECK'
[31742] dbg: FuzzyOcr: Starting FuzzyOcr...
[31742] info: FuzzyOcr: Processing Message with ID "<CHEAKKAEGFCPJIGACABIIEOOCEAA.post@domainname.de>"
[31742] dbg: FuzzyOcr: Skipping OCR, no image files found...
[31742] dbg: FuzzyOcr: Processed in 0.000846 sec.
Top