The goal of this assignment is to demonstrate your mastery of data structures by

Published by John M. Re

Nov 27, 2021

s

Struggling with a similar assignment? Don’t know where to start? Don’t have time to work on this? Get a high-quality paper written for you from scratch – PLAGIARISM FREE, guaranteed to get you a good grade. To get started, please click on the Submit Your Instructions at the bottom of the page.

The goal of this assignment is to demonstrate your mastery of data structures by: 1) creating a new data
structure; and 2) using that data structure as part of a larger software system. This assignment has one
implementation requirement and one evaluation requirement, both detailed below.
Background
Implement a combination spell checker and word prediction utility in American English. A spell checker is
a software feature often built into word processors like Google Docs, in browser input boxes, etc. The
spell checker detects misspelled words and has the ability to suggest alternatives. Most spell checking
software is built on a lot of knowledge of how misspellings occur — for example, the word “bizarre” is
frequently misspelled with one “r” (“bizare”) — and is often guided by word frequency counts. Similarly,
word prediction anticipates the word being typed based on the first letters in the word.
There are versions of word prediction utilities, like predictive text, which anticipates the next word based
on the previous words. Predictive text is not in scope for this assignment.
The utilities will be combined and implemented using a trie, a tree-based data structure not discussed in
detail during class time. Once built, you will evaluate its performance on a “frequently misspelled”
database. Because the spell checker / word prediction required for this implementation is not built on a
state of the art model, our implementation will make mistakes and have unusual suggestions. You will
suggest improvements to the system in order to improve the spell checker’s performance.
Part 1 – Trie
Figure 1: Graphical representation of a trie, from https://en.wikipedia.org/wiki/Trie
Use a trie (pronounced either like “tree” or “try”) to implement a spell checker. Also known as a prefix tree,
a trie is a tree-based data structure. Figure 1 shows one example trie, appropriate for this implementation.
Each node in a trie may represent a type (word) or the partial spelling of a type. A leaf node in the may
represent the full spelling of a type, along with its frequency. In Figure 1 for example, leaf nodes (“a”, “to”,
“tea”, “ted”, “ten” and “inn”) are valid types in American English. The parent of a leaf node may represent
one of two items, either:
● The partial spelling of one or more types (eg. “te” in Figure 1)
● The full spelling of a type (eg. “i” or “in” in Figure 1)
Figure 1 represents one possible trie for spelling in which the entire prefix is stored in each node. The
path from root to leaf for a type like “inn” contains nodes () —> (i) —> (in) —> (n). There are
implementations of a trie in which each node contains only the previous letter of the prefix. In this
implementation, the path from root to leaf for a type like “inn” would contain nodes () —> (i) —> (n) —>
(n). Either implementation is acceptable for this assignment.
Part 2 – Data
Fill the trie with data from the file unigram_freq.csv. This file will be supplied to your implementation as the
first argument. This file is from Rachael Tatman’s English Word Frequency. It contains the 333,333 most
frequently used types from Google’s Trillion Word Corpus, along with the frequencies of those types, in
CSV format. Each type, including proper names like “Michelle”, is converted to lowercase, and there are
no repeated entries in the file.
The first 5 lines of this file are as follows:
word,count
the,23135851162
of,13151942776
and,12997637966
to,12136980858
The first line of this file describes each column. (This is common in CSV data files.) This line may be
ignored. All subsequent lines contain a type, a comma and an integer representing the frequency of the
type in the corpus (data set). The second line shows the most frequent type in the corpus is “the”, with
more than 23 billion tokens (i.e. used more than 23 billion times in the corpus).
Part 3 – Mechanism for Check Spelling & Word Prediction
The overall class design is up to you, but your implementation must contain at least one Java class
named Spelling, which must expose at least one public function, suggest(…), which itself must
take two parameters: token and count, and must return a List of String instances. In other words, the
function in Spelling.java must have the following header:
public List> suggest (String token, int count)
Assuming the parameter token contains n characters, for each character (1 .. n), the suggest(…)
function adds count types for the token. The suggested types should be the most frequent which share
the prefix with the input, up to and including the ith character. Where no prefix can be found, the
implementation must assume the parameter token is incorrectly spelled, and the most frequent prefix to
the point of misspelling should be used.
For example, if the parameter token has the value “onomatopoeia” and parameter count has the value
5, the return List> should be the following:
{ {“of”, “on”, “or”, “our”, “one”},
{“on”, “one”, “only”, “online”, “once”},
{“ona”, “onan”, “onalaska”, “onassis”, “onanie”},
{“onomatopoeia”, “onoma”, “onoml”, “onomichi”, “onomastics”},
{“onomatopoeia”, “onoma”, “onoml”, “onomichi”, “onomastics”},

{“onomatopoeia”, “onoma”, “onoml”, “onomichi”, “onomastics”}
}
The first List, which contains “of”, “on”, etc., represents the most frequent types starting with the
letter “o”. The second List, which contains “on”, “one”, etc. represents the most frequent types
starting with the prefix “on”. In this example, the fifth and subsequent List contains types like
“onomichi” which do not have the prefix “onoma.” This occurs because there are fewer than count (5)
types in the file unigram_freq.csv with the “onoma” prefix. In this case, the most frequent types with the
longest prefix are included.
In addition, your implementation must contain a “main” function which accepts two parameters which may
be passed to the suggest(…) function:
● The name and location of the unigram_freq.csv file
● The count parameter
Assuming the class containing your “main” function is named A2, your implementation must be called as
follows on a MacOS / Unix system:
java A2 ../data/unigram_freq.csv 5
(The path would be specified differently on a Windows system.)
Part 4 – Improvements (Thought Exercise)
Test your implementation with all the data from the file misspelling.csv, which contains some of the most
frequently-misspelled types in English in mixed case, according to the Oxford English Corpus. Observe
the number of times the correct spelling appears in the List> returned from the
suggest(…) function, varying the value of the count variable between 3 and 7 .1
What, if anything, could be changed in the implementation to get the correct spelling of the input tokens?
1 Why numbers between 3 and 7? The lower limit, 3, is what current phone and tablet interfaces have
settled on. The upper limit is 7 items because psychologists have determined that we may have a
maximum capacity for processing information, the same reason phone numbers started with at most 7
digits. Of course, no one really uses phone numbers any more.
Submission
No starter code is provided. Check your Java code — including any main function, interfaces and class
implementations — into a GitHub repository. Do not check in the data files (unigram_freq.csv or
misspelling.csv), but you may refer to them in comments or in the auxiliary files of your repository. Submit
the link to this GitHub repository on Canvas.
Grading
Your grade for this assignment will be determined as follows:
● 60% = Implementation: your class implementations must run successfully with valid input arrays.
The implementation must produce the expected results. Any deviation from the expected results
results in 0 credit for implementation. Each portion of the implementation (Trie, Data, Mechanism)
is equally weighted.
● 15% = Improvements (Thought Experiment): you must demonstrate that you have executed one
or more tests of your implementation, examined the results and have drawn larger conclusions
about the behaviour of your implementation, including: how it may be improved.
● 10% = Decomposition: in the eyes of the grader, your implementation must demonstrate a
reasonable object oriented decomposition — i.e. encapsulation, polymorphism and inheritance.
● 5% = Efficiency: in the eyes of the grader, your implementation must be maximally efficient with
respect to running time and required space. Despite the name, your submission for Part 2 is not
required to be faster than merge sort. However, it is required to be faster than the quadratic
sorting algorithms.
● 5% = Style: in the eyes of the grader, your implementation must be well-commented, use
intelligently-named variables and functions.
● 5% = Documentation: all required documents, “stopwatch” charts, running times and descriptions
must be clear and unambiguous, and these must match the true running time and true space of
your implementation

n

Need Writing Help? Our writing specialists are here 24/7, every day of the year, ready to support you! Instantly chat with an online tutor below or click here to submit your paper instructions to the writing team.

[jetpack-related-posts]

More than just an assignment.

GET THE GRADES YOU DESERVE | A OR A- GUARANTEED
NO MORE SLEEPLESS NIGHTS DOING RESEARCH
NO MORE LATE POINTS DEDUCTIONS 
GET A QUALITY PAPER SENT TO YOUR EMAIL
GET GOOD GRADES ON YOUR ASSIGNMENTS
Explore Now →

Who is this homework service for?

* If you are having a really hard class and want to get through it, then this is for you.

* If you have a medical emergency or someone close to you has a medical emergency and you don’t think you’ll be able to turn your assignment on time, this is definitely a service you could use.

* You can use us if you are having a tough Professor who won’t give you the grades you deserve.

* If you have a tight work schedule and you are getting points deducted for not submitting assignments on time.

* English might not be your first language and you feel like you are being left behind in class because of it.

* If you have a large project coming up and don’t think you have enough time to get it done well, definitely reach out to us.

TALK TO SUPPORT
{

Super stoked you are checking us out! We would like to help you with your assignment. We just need a few things from you:

* The full assignment instructions as they appear on your school account.

* If a Rubric is present, make sure to attach it.

* Any relevant weekly readings or learning resources.

* Include any special announcements or emails you might have gotten from your Professor regarding your assignment.

* Any templates or additional files required to complete the assignment.

If your assignment is somewhat complex and you need to explain it, please don’t hesitate to reach out to me via live chat. 

 

FAQ

Frequently asked questions

How soon can I get my paper done?

It depends with your deadline. If you need your paper completed in 3 hours, we will deliver it in that time. All you need to do is indicate your deadline in our custom order page here. Alternatively, if you are sending us your instructions via email, please be sure to indicate your deadline.

Will it be completely original? I don't want to be caught in a case of Academic Integrity Violation.

We are as paranoid as you are. Maybe even more! And we understand that the greatest sin you can commit in your academic journey is plagiarizing your academic work. To that end, we have made sure that we check and double-check our papers using high quality plagiarism detection tools such as SafeAssign and Turnitin before submitting the paper to you.

Who is my writer? Is he/she a native English Speaker?

All our writers are native English Speakers. That is not to say that ESL writers are not good, we just prefer hiring native writers because we want the very best people working on your paper. This might mean paying a little bit more for your paper as opposed to when you pay a foreign company whose writers are non-native English Speakers.

What if I need revisions? Will your charge additional for this?

Of course not! If you do happen to require a revision on your paper, our team will handle it for you free of charge. Matter of fact, we won’t rest till you are happy with your paper. So, ask for as many revisions as you need, it’s completely FREE!

Will you give me my money back if I don't like my paper?

We have very few instances where we delivered a paper that a client didn’t fall in love with. But if it so happens that you don’t like your paper for any reason whatsoever, we’ll refund your money back no questions asked.

I have more assignments after this, can you help me with those too?

Of course! And what’s even better is that we can reserve a writer permanently to work on your entire class. This comes in handy for projects which build up on each other and where you need just one writer, one writing style.

I got my order information wrong, can I change that?

Yes you can. Just reach out to our support team via email (support@essaynook.com) or live chat here and they’ll help you change the instructions.

Can I place an order via email instead of going through the order page?

Yes you can. Email Anna at anna@essaynook.com, she’s in charge of our sales team. Alternatively, you can talk to our Live Chat team here and request to speak to Anna.

Trusted by Thousands of Students

Delivering quality assignments since 2007