Project 5 consists of (a) writing a Java application that performs timing measurement on sorting and tree traversal algorithms, then (b) summarizing experimental results and analyses in a 4-5 page report.
Important Note: This should not entail writing of new sorting or tree traversal code, as you already did this for Projects 3 and 4. Hence, the effort for this project involves (i) inserting timing calls into your code, (ii), measuring runtime for different-sized inputs, (iii) benchmarking various operations in Java on your computer, (iv) constructing a complexity budget and prediction of runtime from your previous analysis of algorithms (homework and class notes), and (v) comparing the measured and predicted performance.
Answers to student questions are included below in red typeface, to distinguish them from the assignment text. Note the clarification of total computation time versus total algorithm runtime.
The program you develop as Project 5 will (a) read a sequence of
integers from stdin
as in Project 1, (b) store the
sequence in an array or linked list, then (c) according to two
command-line switches, run a sorting or tree traversal algorithm using
various data structures, as you did for Projects 3 and 4. While
running the program, you will (d) measure timing data for enough
iterations of the algorithm to ensure 99 percent accuracy (one percent
error) given the one millisecond resolution of the UNIX timing
function(s). Your program will (e) output the correct information for
the algorithm or data structure selected, per Programming Assignments
3 and 4 specification, then will (f) output the measured or computed
time intervals (specified below), obtained by dividing the measured
times by the number of iterations. An example of the input file for
Project #5 to be used for all algorithms in this project, is given
below.
When the TAs test the program, the numbers will be piped in from a
text file (denoted by inFile
), and the command
line will be as follows:
java Proj5 dswitch aswitch < inFile
dswitch
is one of the following switches:
-avl
for AVL search tree or
-bst
for BST, -arr
for array or -dll
for doubly-linked list;
and aswitch
is one of the following switches:
-b
for BFS or -d
for DFS, -h
for histogram-sort, -i
for insertion-sort, or
-m
for merge-sort.
Clearly, BFS and DFS can only be
used with AVL search tree or BST, and array and DLL can only be
used with one of the sorting algorithms. If you want to have a
better chance of getting points for the quality of your code,
then construct an error detection and reporting method that will detect
mismatches between the dswitch
and
aswitch
values (i.e., conditions that do not
meet the preceding structure-method associations).
Note that inFile
, a text file that you enter
using a word processor, will be stored on disk. Your program will read
the input sequence by command-line piping of that file to
stdin
.
More information about the test procedure is posted below under Example Test Procedure.
stdin
piped from a
text file whose name is specified on the command line. The
input file must be structured as a stream of positive or
negative integers (up to eight digits per number), one per line,
followed by an "E" when the input sequence is completed.
Following the "E" will be a line with the number to be deleted
from the tree, in the case of AVL or BST only. In the
sorting methods, this last value will be ignored.
Step 2. Write the following header to the screen:
your-name your-SSN 5 August 1999 COP3530-C99-Proj5Then, write the first and last 50 numbers of the input sequence to the screen, ten integers per line, using the code you developed for Projects 2 and 3 (If the sequence is shorter than 50 numbers, just write the sequence itself):
Input Sequence: 4 2 9 3 5 6 7 8 10 11 1 12 14 13The screen will display the following information after the header, and separated from the header by two blank lines:
Data Structure: (whatever one dswitch specifies) Algorithm: (whatever one aswitch specifies) Numbers Read In: 14 Input Sequence: 4 2 9 3 5 6 7 8 10 11 1 12 14 13
Step 3. Depending on the dswitch or aswitch flags, perform sorting or graph traversal, with appropriate output, according to the Programming Assignment 3 and 4 descriptions. However, you should write the tree-printing output to the screen only if there are fewer than 51 elements in the input. Otherwise, you cannot fit the tree on the screen. The comments about input sequence output in the preceding step also apply to output of the sorted sequence or of DFS/BFS traversal (e.g., first 100 vertices and last 100 vertices).
If you have problems displaying the tree, then you may print either fewer than 30 items, or you may elect not to print at all. Those who print out their trees will have a better chance at partial credit.
Step 4. You will need to wrap the code in timing loops, as described in class, in order to measure the following times for each method and data structure. Example code for timing call insertion is given in this clickable link prepared for Dr. Crummer's class. A description of component times follows:
Screen I/O Time -- Time required to write the header and output information to the screen. This does not include the time required to write timing data to the screen.
Data Structure Maintenance Overhead -- Time required to put values into and take them out of data structures that are central to the algorithm of regard. This means array or DLL maintenance in sorting, and element insertion or deletion cost in BST and AVL construction, as well as traversal cost. As discussed in class, this also means the time required to put input values into a data structure, for example, building an AVL tree.
Hint: Run your BST/AVL tree insertion/deletion ADT operations separately to find out how much each one costs for different sized trees. Then, accumulate the number of calls to each ADT operation, as you were shown how to do with branch-taken counts for IF statements, in class.
Total Algorithm Time -- measured from the time that you first invoke the algorithm to the time that it completes its work (including screen I/O).
Estimated Computation Time -- computed (not measured) as the Total Algorithm Time minus the (File I/O + Screen I/O + Data Structural Overhead) times measured or computed previously.
Step 5. After you have the five time intervals specified in Step 4, write them to the screen using the following output format:
Timing Data: File I/O....................: ____time____ Screen I/O..................: ____time____ Data Structural Overhead....: ____time____ Estimated Computation Time..: ____time____ ------------------------------------------ TOTAL ALGORITHM TIME ____time____where
____time____
denotes a timing value in seconds.
For example, if Screen I/O takes 3.29 milliseconds, then your program
would write "0.00329 seconds" beside that category.
Step 6. After you have tested and completed your program, use the submission procedure outlined below to submit your project.
Step 7. After you have submitted your program, run it for at least seven different sizes of input (e.g., 10, 20, 50, 100, 200, 500, and 1000 inputs) to obtain performance data. If you want a better chance at a good grade, then try 2000, 5000, and even 10,000 input points. After you have your performance data, then write a report that has the above-listed timing data for each permissible combination of aswitch and dswitch values, as well as for each input size. A recommended tabular format follows:
Measured or Computed Timing Data ------------------------------------------------------------- Input Size File I/O Screen I/O ADT Overhead Computation TOTAL TIME ---------- -------- ---------- ------------ ----------- ---------- 10 __time__ __time__ __time__ __time__ __time__ 20 __time__ __time__ __time__ __time__ __time__ : : : : : : 10000 __time__ __time__ __time__ __time__ __time__
Graph each of the times listed above, and the total time, using log-scaled abscissa (and ordinate, if necessary).
For each structure-method pair, use the theory we developed in class and in the homeworks to predict the runtime of algorithm. In order to build up your computational work budgets, you will need to measure the runtime of various operations, and construct a computational budget for each algorithm. You can do this with little Java programs that you write (no need to turn them in), as discussed in class.
In the report, compare and contrast predicted performance with measured performance. For example, if AVL tree insertion is supposed to take O(log n) time, but your measured data indicate otherwise, state this. Then, based on the computational budget and runtime prediction you constructed (per the previous paragraph, like you did with Data Structure Maintenance Overhead), analyze why this prediction was not met by the measured data. (For example, your I/O time might be much larger than you thought, or you might have coded the ADT operations so they take many more comparisons than are needed for efficient operations). Discussion should center on the data, and why it meets or does not meet your theoretical predictions. Do not make sweeping generalizations or conjectures without solid analytical background and citation of the measured data.
Example Test Procedure. Type up a text file "inFile" in either Dos or Unix with one value per line, like this:
22 [Input number 1] 36 [Input number 2] 54 [Input number 3] 14 [Input number 4] 17 [Input number 5] 65 [Input number 6] E [Specifies end of input sequence] 14 [Number to be deleted from tree -- ignored by sorting methods]
When making a big file (e.g., 10,000 inputs) you should write a program to generate unique random numbers. You can do this with two nested for loops whose indices reference primes. Hint: Recall the Fundamental Theorem of Mathematics from Discrete Math class.
Now run your program this way:
java Proj5 dswitch aswitch < inFile
where dswitch
and aswitch
were described previously.
Required Programming Procedure. Define each class in a
separate .java
file, with your master file as
Proj5.java
, where Proj5, which has a main method,
is the class name that the TAs will use to run your program. (It is
o.k. to have a main method in each class - the TAs are primarily
interested in the main
method in the Proj3
class.)
Other Hints on Programming Procedure. Use the following steps to guide your implementation of the Java version of the preceding program.
Hint 1. Get started early!!! If you wait until the last two or three days, you will have a disaster, since no late submission is allowed except for the reasons cited below.
Hint 2. Decide what classes you will use. It is strongly
suggested that you adapt Dr. Sahni's classes as shown in the text, to
avoid undue programming effort. You can get code from Dr. Sahni's Web
page at http://www.cise.ufl.edu/~sahni/dsaaj/
, but you
must not copy the code verbatim, instead adapting it for your
use. After you have figured out what classes and objects to use,
design your Java class specification(s) on paper. Then (and only
then) should you start coding.
Hint 3. Code your class(es) and methods, testing them incrementally (i.e., one at a time). Get each one working before you go on to the next one. If you write your program all in one blob of code, you will assuredly have difficulty debugging it.
Hint 4. Try your program on different kinds of output (e.g., large and small inputs, with both negative and positive integers). If your code breaks, fix it until it works on all inputs. Use the pipe construct on the command line to get the input into your application.
Hint 5. Use the streaming I/O reader developed for Project 3. If you developed another kind of reader for Project 3, you must modify it so it meets this specification. Deviations will be considered erroneous.
Hint 6. The TAs are here to help you with your problems. We have scheduled office hours throughout the week, with the exception of the Independence Day Holiday. Please feel free to ask the TAs any questions about your Java program.
Documentation. Illustrate your program with in-line comments, so the TAs know what you are doing. Consult your textbook and other Java programming references provided on the course Web page and elsewhere (e.g., Java books in library) for style guides.
Evaluation. Maximum score = 150 points, as follows:
For each of the ten structure-method pairs (six for sorting, four for trees): Code present and clear 1 points max. Code compiles correctly 1 points max. Cmd-line switches work correctly 1 points max. Screen output displayed correctly 1 points max. Timing calls inserted correctly 3 points max. Correct computation of timing data 2 points max. Correct display of timing data 1 points max. TOTAL POINTS PER DataStructure/Method Pair: 10 points max. ---------------------------------------------------------------- TOTAL PROGRAM POINTS 100 points max. For the Report: (For each of ten structure-method pairs) Tabular data for structure-method pair 1 points max. Graph of timing data for each s-m pair 1 points max. Work budget for structure-method pair 1 points max. Complexity prediction for each sm pair 1 points max. Comparison of prediction and data 1 points max. --------------------------------------------------------------- TOTAL POINTS PER DataStructure/Method Pair: 10 points max. TOTAL REPORT POINTS 50 points max.
Note that we have "raised the bar" on this assignment. For example, if your code is present as specified, but doesn't compile and isn't documented, you would get 1 point per structure-method pair for which the code is present (10 points max.) If the code is present, well documented, compiles, reads in data, and outputs the result correctly (but doesn't do any timing measurement or data display) then you could get as many as 40 points total.
Submission Procedure. Please submit only your *.java files. You must be logged in to your CISE account to perform the submission procedure, which is described in detail at this link.
In summary, type in the following commands within your directory to submit all the java files for Project 5:
turnin -c cop3530 -p proj5 *.java
{Return}
Important Note: In case you have developed your programs on any environment other than the CISE Unix system, please test your program on a CISE Unix machine before submission. We will only be testing programs for grading purposes on the CISE Unix machines. The excuse that you could not get into the CISE Lab will not be accepted, since you can remote log-in to all CISE and CIRCA labs 24 hours per day.
Special Note from Lloyd Noronha:
Before starting any programming project, please make sure that
there are no unnecessary files in your "dev" directory. Make sure you
only submit the files necessary for compiling and running your
project. You will lose points for compilation errors in any file you submit,
even if it is not relevant. We will compile and run your project as
follows :
So please make sure that you test run your project as above in your
directory just before your submission.
javac *.java
java Proj5 -avl -i < inFile
Due Date. Submit the completed project by MIDNIGHT on Thursday 5 August 1999. No late submissions will be permitted, except for documented excuses (medical, police, fire, judicial, death of immediate family member, or military service).
This means that the submission program will be turned off on midnite of the due date. Due to tight grade submission deadlines for the Summer-C Semester, we have to allow enough time to grade the assignments (and final exams).
This concludes the description of Project #5. Use the E-mail link at the top of this Web page to ask the TAs, if you have any questions about programming or grading.