Java Program: Trie

Description:

You are to write a Java program that implements a variation of a tree structure known as a trie. A trie is a particularly efficient structure to use for counting the occurrences of words. It is a 26-ary tree, meaning that each node has 26 children. Each node stores a counter and an array of 26 references to children (some or all of which might be null).

Criteria:

  • Your trie is required to support the following public methods:
    • Trie(stream): Constructs a Trie by reading from an InputStream until reading fails
    • uniqueWords(): Returns a count of the number of unique words inthe input stream
    • totalWords(): Returns a count of the total number of words in the input stream
    • display(min): Display all words, one per line, in alphabetical order that have a count of at least min (each line of output should have the word count, a tab and then the word)
  • The methods uniqueWords and totalWords should have O(1) running time.
    • For this assignment a word is defined to be a contiguous sequence of letters.
    • That means that contractions like “isn’t” and “I’m” are not considered words (instead, they would be broken into “isn”, “t”, “I” and “m”).
    • This definition similarly does not allow for hyphenated words like “merrygo-round.”
  • In processing the input file, whenever you encounter a letter, you should continue to read letters until you encounter something that is not a letter.
    • That non-letter will signal the end of the word.
    • You should convert all letters to lowercase for the purposes of this assignment so that we end up with a 26-ary tree instead of a 52-ary tree.
  • In processing letters, you might find it useful to use the isLetter and toLowerCase methods defined in the Character class.
    • You also might want to use Stuart’s TextReader class, which is included in the folder for this assignment.
    • You will not, however, want to use the readWord method in Stuart’s TextReader class because it uses a different definition of a word than that given above.
  • Because your Trie constructor takes a parameter of type InputStream, you will need to include the following at the beginning of your code file for class Trie: import java.io.*;
    • You will be writing two classes: the Trie class and a node class used to implement the trie. You are expected to use good style in writing these classes (comments, good variable names, constants when appropriate, etc)
  • A driver program is provided for testing your classes.
  • For this assignment, you should execute your program from a DOS shell, redirecting input from an input file.
    • For example, using the driver program and the hamlet input file, you’d give this command in the DOS shell: java WordCount < hamlet.txt

Conclusion:

During my data structures classes I had the chance to create and explore many different ideas of how to store and parse data. This was one of my favorites from the class hence the reason I have posted it here. It is a little different than the other programs I have posted here, it is not designed to be user friendly, more to be used as a tool by another program.

Links:

Comments: [Add a Comment]

Creative Commons License   This webpage and all of its contents are licensed under a
Creative Commons License by Dr-EagleEyes.com.