man httrack Command

Man page for apt-get httrack Command

Man Page for httrack in Linux

Ubuntu Man Command : man httrack

Man Httrack  Command

This tutorial shows the man page for man httrack in linux.

Open terminal with 'su' access and type the command as shown below:
man httrack

Result of the Command Execution shown below:

httrack(1)                                                                                                                                                httrack(1)



NAME
httrack offline browser : copy websites to a local directory

SYNOPSIS
httrack [ url ]... [ filter ]... [ +filter ]... [ O, path ] [ %O, chroot ] [ w, mirror ] [ W, mirror wizard ] [ g, get files ] [ i, con Äê
tinue ] [ Y, mirrorlinks ] [ P, proxy ] [ %f, httpproxy ftp[=N] ] [ %b, bind ] [ rN, depth[=N] ] [ %eN, ext depth[=N] ] [ mN,
max files[=N] ] [ MN, max size[=N] ] [ EN, max time[=N] ] [ AN, max rate[=N] ] [ %cN, connection per second[=N] ] [ GN, max pause[=N] ] [
%mN, max mms time[=N] ] [ cN, sockets[=N] ] [ TN, timeout ] [ RN, retries[=N] ] [ JN, min rate[=N] ] [ HN, host control[=N] ] [ %P,
extended parsing[=N] ] [ n, near ] [ t, test ] [ %L, list ] [ %S, urllist ] [ NN, structure[=N] ] [ %D, cached delayed type check ] [
%M, mime html ] [ LN, long names[=N] ] [ KN, keep links[=N] ] [ x, replace external ] [ %x, disable passwords ] [ %q, include query string
] [ o, generate errors ] [ X, purge old[=N] ] [ %p, preserve ] [ bN, cookies[=N] ] [ u, check type[=N] ] [ j, parse java[=N] ] [ sN, ro Äê
bots[=N] ] [ %h, http 10 ] [ %k, keep alive ] [ %B, tolerant ] [ %s, updatehack ] [ %u, urlhack ] [ %A, assume ] [ @iN, protocol[=N] ] [
%w, disable module ] [ F, user agent ] [ %R, referer ] [ %E, from ] [ %F, footer ] [ %l, language ] [ C, cache[=N] ] [ k,
store all in cache ] [ %n, do not recatch ] [ %v, display ] [ Q, do not log ] [ q, quiet ] [ z, extra log ] [ Z, debug log ] [ v, ver Äê
bose ] [ f, file log ] [ f2, single log ] [ I, index ] [ %i, build top index ] [ %I, search index ] [ pN, priority[=N] ] [ S,
stay on same dir ] [ D, can go down ] [ U, can go up ] [ B, can go up and down ] [ a, stay on same address ] [ d, stay on same domain ] [
l, stay on same tld ] [ e, go everywhere ] [ %H, debug headers ] [ %!, disable security limits ] [ V, userdef cmd ] [ %U, user ] [ %W,
callback ] [ K, keep links[=N] ] [

DESCRIPTION
httrack allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and
other files from the server to your computer. HTTrack arranges the original site's relative link structure. Simply open a page of the "mirrored" website in
your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume
interrupted downloads.

EXAMPLES
httrack www.someweb.com/bob/
mirror site www.someweb.com/bob/ and only this site

httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg mime:application/*
mirror the two sites together (with shared links) and accept any .jpg files on .com sites

httrack www.someweb.com/bob/bobby.html +* r6
means get all files starting from bobby.html, with 6 link depth, and possibility of going everywhere on the web

httrack www.someweb.com/bob/bobby.html spider P proxy.myhost.com:8080
runs the spider on www.someweb.com/bob/bobby.html using a proxy

httrack update
updates a mirror in the current folder

httrack
will bring you to the interactive mode

httrack continue
continues a mirror in the current folder

OPTIONS
General options:
O path for mirror/logfiles+cache ( O path mirror[,path cache and logfiles]) ( path <param>)

%O chroot path to, must be r00t ( %O root path) ( chroot <param>)


Action options:
w *mirror web sites ( mirror)

W mirror web sites, semi automatic (asks questions) ( mirror wizard)

g just get files (saved in the current directory) ( get files)

i continue an interrupted mirror using the cache ( continue)

Y mirror ALL links located in the first level pages (mirror links) ( mirrorlinks)


Proxy options:
P proxy use ( P proxy:port or P user:pass@proxy:port) ( proxy <param>)

%f *use proxy for ftp (f0 don t use) ( httpproxy ftp[=N])

%b use this local hostname to make/send requests ( %b hostname) ( bind <param>)


Limits options:
rN set the mirror depth to N (* r9999) ( depth[=N])

%eN set the external links depth to N (* %e0) ( ext depth[=N])

mN maximum file length for a non html file ( max files[=N])

mN,N2 maximum file length for non html (N) and html (N2)

MN maximum overall size that can be uploaded/scanned ( max size[=N])

EN maximum mirror time in seconds (60=1 minute, 3600=1 hour) ( max time[=N])

AN maximum transfer rate in bytes/seconds (1000=1KB/s max) ( max rate[=N])

%cN maximum number of connections/seconds (*%c10) ( connection per second[=N])

GN pause transfer if N bytes reached, and wait until lock file is deleted ( max pause[=N])

%mN maximum mms stream download time in seconds (60=1 minute, 3600=1 hour) ( max mms time[=N])


Flow control:
cN number of multiple connections (*c8) ( sockets[=N])

TN timeout, number of seconds after a non responding link is shutdown ( timeout)

RN number of retries, in case of timeout or non fatal errors (*R1) ( retries[=N])

JN traffic jam control, minimum transfert rate (bytes/seconds) tolerated for a link ( min rate[=N])

HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or slow ( host control[=N])


Links options:
%P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don t use) ( extended parsing[=N])

n get non html files near an html file (ex: an image located outside) ( near)

t test all URLs (even forbidden ones) ( test)

%L <file> add all URL located in this text file (one URL per line) ( list <param>)

%S <file> add all scan rules located in this text file (one scan rule per line) ( urllist <param>)


Build options:
NN structure type (0 *original structure, 1+: see below) ( structure[=N])

or user defined structure ( N "%h%p/%n%q.%t")

%N delayed type check, don t make any link test but wait for files download to start instead (experimental) (%N0 don t use, %N1 use for unknown exten Äê
sions, * %N2 always use)

%D cached delayed type check, don t wait for remote type during updates, to speedup them (%D0 wait, * %D1 don t wait) ( cached delayed type check)

%M generate a RFC MIME encapsulated full archive (.mht) ( mime html)

LN long names (L1 *long names / L0 8 3 conversion / L2 ISO9660 compatible) ( long names[=N])

KN keep original links (e.g. http://www.adr/link) (K0 *relative link, K absolute links, K4 original links, K3 absolute URI links) ( keep links[=N])

x replace external html links by error pages ( replace external)

%x do not include any password for external password protected websites (%x0 include) ( disable passwords)

%q *include query string for local files (useless, for information purpose only) (%q0 don t include) ( include query string)

o *generate output html file in case of error (404..) (o0 don t generate) ( generate errors)

X *purge old files after update (X0 keep delete) ( purge old[=N])

%p preserve html files as is (identical to K4 %F "" ) ( preserve)


Spider options:
bN accept cookies in cookies.txt (0=do not accept,* 1=accept) ( cookies[=N])

u check document type if unknown (cgi,asp..) (u0 don t check, * u1 check but /, u2 check always) ( check type[=N])

j *parse Java Classes (j0 don t parse, bitmask: |1 parse default, |2 don t parse .class |4 don t parse .js |8 don t be aggressive) ( parse java[=N])

sN follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always, 3=always (even strict rules)) ( robots[=N])

%h force HTTP/1.0 requests (reduce update features, only for old servers or proxies) ( http 10)

%k use keep alive if possible, greately reducing latency for small files and test requests (%k0 don t use) ( keep alive)

%B tolerant requests (accept bogus responses on some servers, but not standard!) ( tolerant)

%s update hacks: various hacks to limit re transfers when updating (identical size, bogus response..) ( updatehack)

%u url hacks: various hacks to limit duplicate URLs (strip //, www.foo.com==foo.com..) ( urlhack)

%A assume that a type (cgi,asp..) is always linked with a mime type ( %A php3,cgi=text/html;dat,bin=application/x zip) ( assume <param>)

can also be used to force a specific file type: assume foo.cgi=text/html

@iN internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only) ( protocol[=N])

%w disable a specific external mime module ( %w htsswf %w htsjava) ( disable module <param>)


Browser ID:
F user agent field sent in HTTP headers ( F "user agent name") ( user agent <param>)

%R default referer field sent in HTTP headers ( referer <param>)

%E from email address sent in HTTP headers ( from <param>)

%F footer string in Html code ( %F "Mirrored [from host %s [file %s [at %s]]]" ( footer <param>)

%l preffered language ( %l "fr, en, jp, *" ( language <param>)


Log, index, cache
C create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) ( cache[=N])

k store all files in cache (not useful if files on disk) ( store all in cache)

%n do not re download locally erased files ( do not recatch)

%v display on screen filenames downloaded (in realtime) * %v1 short version %v2 full animation ( display)

Q no log quiet mode ( do not log)

q no questions quiet mode ( quiet)

z log extra infos ( extra log)

Z log debug ( debug log)

v log on screen ( verbose)

f *log in files ( file log)

f2 one single log file ( single log)

I *make an index (I0 don t make) ( index)

%i make a top index for a project folder (* %i0 don t make) ( build top index)

%I make an searchable index for this mirror (* %I0 don t make) ( search index)


Expert options:
pN priority mode: (* p3) ( priority[=N])

p0 just scan, don t save anything (for checking links)

p1 save only html files

p2 save only non html files

*p3 save all files

p7 get html files before, then treat other files

S stay on the same directory ( stay on same dir)

D *can only go down into subdirs ( can go down)

U can only go to upper directories ( can go up)

B can both go up&down into the directory structure ( can go up and down)

a *stay on the same address ( stay on same address)

d stay on the same principal domain ( stay on same domain)

l stay on the same TLD (eg: .com) ( stay on same tld)

e go everywhere on the web ( go everywhere)

%H debug HTTP headers in logfile ( debug headers)


Guru options: (do NOT use if possible)


Related Topics

Apt Get Commands