|
Broken Links Retrieval: 3 - The Basic Java Instructions
par Lionel
|
Part 3 - The Basic Java Instructions
HttpUnit's documentation is very clear and simple; you can also
susbsribe to a mailing list. This makes it easy to develop your code. We
will not enter into the detail of this code. We developped our own classes.
You can download them and rey on this layer to built your own applications.
1. The Core Program
Our program is a standalone Java application that runs from a client workstation.
Of course, we could put this code in a Java agent. The following is the
code of the Basic.java file:
import
lotus.notes.*;
import java.io.*;
import java.net.*;
import java.lang.*;
import java.util.*;
import com.meterware.httpunit.*;
public class
Basic {
public static
void main(String[] argv) {
try {
NotesThread.sinitThread();
Session
S = Session.newInstance();
//
********************************************
//
JAVA PROGRAM -- Start
//
********************************************
//
********************************************
//
JAVA PROGRAM -- End
//
********************************************
}
catch (Exception e) {
e.printStackTrace();
}
} // end main
} /*End class Basic */
|
This program does not perform any action: it only initiates a Notes session.
We will write the code between the two big blocks of comments.
The import statements at the top of the file refer to the packages we
need; the most important are lotus.notes.* and com.meterware.httpunit.*.
If you run this program on the server, you should replace lotus.notes.*
by lotus.domino.*.
Try to compile this program using the command line we described in previous
chapter. If the compilation does not work, then there is a problem in
your configuration. Otherwise, it is fine and you can proceed with the
following steps.
2. Our Custom Classes: LCClient and LCResponse
We have created two classes that rely on the HttpUnit package. LCClient
initiates a new HTTP session. The main purposes of this class are to authenticate
with the Domino server and to define the proxy server's information.
Authenticating is necessary because many documents are protected and
are therefore not accessible with an anonymous access. Without authentication,
the HTTP CLient would raise too many 404 or 401 errors.
Notes:
- If you use Session-based Authentication and try to access a protected
document, the server returns a 404 (Not Found) error and not a 401 error
(Unauthorized).
- LCClient only works with Session-based authentication (see "Session-based
name-and-password authentication for Web clients" in Administration
Help)
The other class, LCResponse will help us to manage a response to an HTTP
request. LCResponse provides us with the page title, the response code
(200, 404, ...), with the page headers and most of all with the links
of the page.
I did not include these classes in the present document. They are available
in the file Checker.java.
Simply download this file to view the code. Instructions to customize
the code for your own needs appear later.
3. Understanding the verification process
The following is a sample code that shows how to make a simple check.
Comments appear below the code.
myURL = "http://server.domain.com/db/mydb.nsf/html/home";
LCClient myClient = new LCClient ("http://server.domain.com",
"proxy.domain.com", 8080, "John Doe", "mypassword");
LCResponse rep = new LCResponse();
rep = myClient.GetResponse (myURL, false);
System.out.println ("Status Code: " + rep.Code); |
The first instruction in bold instantiates a new LCClient object. It
allows us to define the proxy server and proxy port. The server root page
is used to tell the program to which Domino server we will authenticate.
You can authenticate to several web servers providing that you have a
valid account and that authentication is done through a form.
The user name and password are both used for the Domino authentication
and the Proxy authentication. We assume that you have a single account
for both servers. It is important that the Domino account has a high access
level; otherwise, the program will not be able to open the page and it
will assume that it is a broken link.
The second instruction simply declares a new LCResponse object. We will
use it to collect information about the page we are testing.
Then we retrieve the page with the GetResponse method. This method has
two arguments: an URL to check and a boolean that specified whether we
should retrieve the list of links inside the page. If true, the program
will build an array of links. This takes a little bit more time.
Finally, we print the HTTP status code. Easy, isn't it?
4. Verifying links within a page
Let's modify the previous program so that it verifies each link inside
the page:
myURL = "http://server.domain.com/db/mydb.nsf/html/home";
LCClient myClient = new LCClient ("http://server.domain.com",
"proxy.domain.com", 8080, "John Doe", "mypassword");
LCResponse rep = new LCResponse();
rep = myClient. GetResponse (myURL, true);
System.out.println ("Status Code: " + rep.Code);
WebLink Link;
for (int k=0; k<rep.Links.length; k++) {
Link = rep.Links [k];
WebRequest req = Link.getRequest
();
try {
String Lnk =
re.getURL)(.toString();
LCResponse rep2
= myClient.GetResponse (Lnk, false);
System.out.println
("Result: " + rep2.Code);
} catch (MalformedURLException ue) {
System.out.println ("Error:
" + ue.toString());
}
} |
This program opens the web page, prints the status code and for each
link inside the page, verifies whether it is still valid or not.
The LCResponse object contains an array of WebLink object. WebLink is
a class from HttpUnit. It defines a link. In order to get the URL in the
link, we use a WebRequest object. I advice you not to use the getURLString()
of the WebLink object.
5. Limits and enhancements
Our HTTP client is still very basic. It makes only a few testing: for
example, if a link is located inside a javascript code, the program will
raise an error. This is also the case if the response is not an HTML page
(such as a PDF file for example).
There is a lot of work to do in order to idenfy these issues and solve
them. It is out of the scope of this short article.
We learned how to verify a link and how to collect and verify links
located in a given page. We have now to mix this technique with our Domino
database. Let's
go!
|