Man page for apt-get ifile Command
This tutorial shows the man page for man ifile in linux.
Open terminal with 'su' access and type the command as shown below:
Result of the Command Execution shown below:
IFILE(1) User Commands IFILE(1)
ifile core executable for the ifile mail filtering system
ifile [ b file] [ q| Q] [ g] [ k] [ o] [ v num] [lexing options] file
ifile c q| Q [ T threshold] [ b file] [ g] [ k] [ o] [lexing options]
ifile [ b file] [ d folder] [ i folder| u folder] [ g] [ k] [ o] [ v
num] [lexing options] file ...
ifile r [ b file]
ifile is a mail filter client that uses machine learning to classify e
mail into folders/mail boxes. The algorithm that it uses is called
Naive Bayes. Basically, naive bayes considers each document an
unordered collection of words and classifies by matching the document
distribution with the most closely matching folder/mailbox distribu
b, db file=file
Location to read/store ifile database. Default is ~/.idata
equivalent of "ifile v 0 | head 1 | cut f1 d". Must be used
with q or Q.
Delete the statistics for each of files from the category folder
f, folder calcs=folder
Show the word probability calculations for folder
g, log file
Create and store debugging information in ~/.ifile.log
Add the statistics for each of the files to the category folder
k, keep infrequent
Leave in the database words that occur infrequently (normally
they are tossed)
l, query loocv=folder
For each of the files, temporarily removes file from folder,
performs query and then reinserts file in folder. Database is
Uses document bit vector representation. Count each word once
Output rating scores for each of the files
Q, query insert
For each of the files, output rating scores and add statistics
for the folder with the highest score
When used with both c and q, output the two highest ranking
categories if their score differs by at most threshold / 1000,
which can be used to detect border cases. When used with q
only and any threshold > 0, output the score difference percent
age. For example,
ifile T1 q foo.txt
might result in
non spam 18728.00272369
diff[spam,non spam](%) 9.21
If so, then
ifile T93 q c foo.txt
will result in
foo.txt spam,non spam
ifile T92 q c foo.txt
will result in
r, reset data
Erases all currently stored information
Same as 'insert' except only adds stats if folder already exists
Amount of output while running: 0=silent, 1=quiet, 2=progress,
a, alpha lexer
Lex words as sequences of alphabetic characters (default)
A, alpha only lexer
Only lex space separated character sequences which are composed
entirely of alphabetic characters
h, strip header
Skip all of the header lines except Subject:, From: and To:
m, max length=char
Ignore portion of message after first char characters. Use
entire message if char set to 0. Default is 50,000.
p, print tokens
Just tokenize and print, don't do any other processing. Docu
ments are returned as a list of word, frequency pairs.
s, no stoplist
Do not throw out overly frequent (stoplist) words when lexing
Use 'Porter' stemming algorithm when lexing documents
w, white lexer
Lex words as sequences of space separated characters
If no files are specified on the command line, ifile will use standard
input as its message to process.
Give this help list
Give a short usage message
Print program version
Mandatory or optional arguments to long options are also mandatory or
optional for any corresponding short options.
ifile database (default location). See FAQ included in ifile
package for description of database format.
and many others. See the
ChangeLog for the full list.
Before using ifile, you need to train it. Let's say that you have
three folders, "spam", "ifile" and "friends", and the following direc
/ + spam + 1
| + 2
| + 3
+ ifile + 1
| + 2
| + 3
+ friends + 1
The following commands build the ifile database in ~/.idata (use the d
option to specify a different location for the database):
ifile h i spam /spam/*
ifile h i ifile /ifile/*
ifile h i friends /friends/*
The h option strips off headers besides "Subject:", "From:" and "To:".
I find that h improves ifile's performance, but you may find otherwise
for your personal collection.
Note that we have made the argument to i the same as the corresponding
folder name. This is not necessary. The argument to i can be any word
you want to use to identify a category of e mails. The argument to i
must not include space characters (including tab, feedline, etc.).
At this point, your ~/.idata file should look something like this:
spam ifile friends
662 1020 6451
3 3 3
jrennie 9 0:3 1:18 2:16
mindspring 6 1:7 2:5
make 9 0:5 1:3
yahoo 9 0:1 1:22 2:2
The first line is the space separated list of folders. Their ordering
specifies a numbering (spam=0, ifile=1, friends=2). The second line is
a token count for each folder (e.g. 662 tokens observed in the three
spam messages). The third line is an e mail count for each folder (e.g.
3 e mails for each of spam, ifile and friends). Each following line
specifies statistics for a word. The format of a line is
word age folder:count [folder:count ...]
where folder is the folder number determined by the first line order
ing. Folders with a count of zero are not listed. So, the line begin
ning with "jrennie" indicates that "jrennie" appeared 3 times in "spam"
e mails, 18 times in "ifile" e mails and 16 times in "friends" e mails.
The age is the number of e mails that have been processed since the
word was added to the database. Very infrequent words are pruned from
the database to keep the database size down.
Now that you have a database, you might want to filter some e mails.
Say you have the following incoming e mails:
/ inbox + 1
To find out what folders ifile thinks these e mails belong in, run
ifile c q /inbox/1
ifile c q /inbox/2
ifile c q /inbox/3
Let's say that 1 is about ifile, 2 is spam and 3 is from a friend.
Assuming ifile does its job correctly, you'll see output like this:
With such little training data, ifile is unlikely to get the labels
correct, but you should get the idea : )
Now, if you move the e mails to the folders suggested by ifile, you'll
want to update the database accordingly. You can do this with the i
option, like before. Or, you can simply use Q in place of q above.
This automatically adds the e mail to the folder ifile suggests.
Now, assume for a moment that e mail 1 was actually spam. We've added 1
to ifile and put it in the ifile folder. We need to move it to the spam
folder and update the ifile database accordingly. We can update the
database with the following command:
ifile d ifile i spam /inbox/1
This deletes the e mail from "ifile" and adds it to "spam".
Examples of how to use ifile together with procmail(1) and metamail(1)
can be found in the directory /usr/share/doc/ifile/examples.
ifile 1.3.4 November 2004 IFILE(1)