cs 1301 – ch 6, handout 1 · web viewessentially a constant amount of time no matter how many...

13
CS 1302 – Chapter 21 Sets We will cover Sections 1-6 in Chapter 21. Section 21.1 – Introduction 1. In this chapter we will study the Set and Map interfaces and their common implementations. Section 21.2 – Sets 1. Set is a sub-interface of Collection that: a. Doesn’t allow duplicates. There are no two elements: e1 and e2 in the set such that e1 .equals(e2 ). mySet.add (e1) simply returns false if e1 already exists. b. Doesn’t provide random (positional) access The Collection interface does not specify a get(pos) method. 2. Java provides three common implementations: a. HashSet – Doesn’t guarantee any particular ordering. If you iterate over a set, you will see all the elements, but they will not be (in general) in the order that you added them. b. LinkedHashSet – Elements are ordered according to the order they were added. c. TreeSet – Elements are ordered according to Comparable or Comparator. 3. Speed: Consider add, remove, contains a. HashSet – Very fast, O(1)*. Essentially a constant amount of time no matter how many items are in the set. b. LinkedHashSet – Very fast, O(1)* c. TreeSet – Fast, O(log n)* * This is called Big ‘O’ notation. You will learn about this in CS 3410. It is a measure of how fast an algorithm is. We will briefly discuss the graph at the top of this page: http://bigocheatsheet.com/ . Another reference: 1

Upload: others

Post on 26-Dec-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

CS 1302 – Chapter 21Sets

We will cover Sections 1-6 in Chapter 21.

Section 21.1 – Introduction

1. In this chapter we will study the Set and Map interfaces and their common implementations.

Section 21.2 – Sets

1. Set is a sub-interface of Collection that:

a. Doesn’t allow duplicates. There are no two elements: e1 and e2 in the set such that

e1.equals(e2). mySet.add(e1) simply returns false if e1 already exists.

b. Doesn’t provide random (positional) access The Collection interface does not specify a get(pos) method.

2. Java provides three common implementations:

a. HashSet – Doesn’t guarantee any particular ordering. If you iterate over a set, you will see all the elements, but they will not be (in general) in the order that you added them.

b. LinkedHashSet – Elements are ordered according to the order they were added. c. TreeSet – Elements are ordered according to Comparable or Comparator.

3. Speed: Consider add, remove, contains

a. HashSet – Very fast, O(1)*. Essentially a constant amount of time no matter how many items are in the set.b. LinkedHashSet – Very fast, O(1)*c. TreeSet – Fast, O(log n)*

* This is called Big ‘O’ notation. You will learn about this in CS 3410. It is a measure of how fast an algorithm is. We will briefly discuss the graph at the top of this page: http://bigocheatsheet.com/. Another reference:

http://infotechgems.blogspot.com/2011/11/java-collections-performance-time.html

1

Page 2: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

Section 21.2.1 – The HashSet Class

1. HashSet is an implementation of the Set interface that provides no guarantee of the order of elements when iterated over. It also doesn’t introduce any methods so the only behaviors a HashSet has are those specified in the Collection interface.

2. Creating a HashSet.

a. We can create a HashSet with the no-arg constructor: HashSet(). For example:

Set<String> hsCities = new HashSet<>();

The reference type can be Set or HashSet (or Collection).

b. We can also create a HashSet from any other collection using this constructor:

HashSet(c:Collection<? extends E>)

For example:

ArrayList<String> alCities = new ArrayList<>();...Set<String> hsCities = new HashSet<>(alCities);

3. A HashSet implements all the Collection methods: add(o), addAll(collection), clear, contains(o), containsAll(collection), equals(o), isEmpty, iterator, remove(o), removeAll(collection), retainAll(collection), size.

4. We can iterate over a HashSet using a for-each loop or an iterator. As we noted earlier, there is no guarantee of order. For example:

Set<String> names = new HashSet<>();names.add("cat"); names.add("dab"); names.add("fia");names.add("fre"); names.add("gor"); names.add("pet");

for(String name : names) {System.out.print(name + "

");}System.out.println();

Iterator<String> iter = names.iterator();while(iter.hasNext()) {

System.out.print(iter.next() + " ");}

Output: dab cat fre gor pet fia Output: dab cat fre gor pet fia

5. As mentioned previously, filtering a collection refers to the process of iterating over the collection and selectively removing certain elements and the preferred way to do that is to use an Iterator. For example, to remove names that contain the letter, “a”:

Set<String> names = new HashSet<>();...Iterator<String> iter = names.iterator();while(iter.hasNext()) {

if(iter.next().contains("a"))2

Page 3: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

iter.remove();}

6. We can store instances of a custom class in a HashSet, but you must override hashCode and equals. We will not consider this in this class; however, you will learn what a hash code is in CS 3410.

7. In Lab 10 we did an experiment by timing how long it took to remove 25,000 integers from a HashSet, ArrayList, and LinkedList containing 50,000 integers, then 100,000, …, 200,000. My results are shown below:

Homework

1. Write a method, removeLongNames that accepts a set of names and an integer, len. The method should remove any names from the set with length greater than len.

2. Write a method, separateLongNames that accepts a set of names and an integer, len. The method should remove any names from the set with length greater than len and return a set of the names that were removed.

3. Write a method, getNamesByLength that accepts a list of names and two integers, len1 and len2. The method should return a set with the names in the list that have length between (inclusive) len1 and len2.

4. Write a method, getUniqueNames that accepts a list of names and returns a list with duplicates removed, e.g. Example: getUniqueNames(“alpha”, “beta”, “alpha”, “gamma”} -> {“alpha”, “beta”,”gamma”}. Hint: use a HashSet and before returning, convert it to a list.

5. Write a method, countVowels that accepts a string and returns the number of vowels (lower case).6. Write a method, removeDuplicates that accepts two sets of names and removes from each set the names that

are in common between the two sets. For example: removeDuplicates( set1=[“a”,”b”,”c”,”d”,”e”], set2=[“z”,”a”,”p”,”d”] ), when complete, set1=[“b”,”c”,”e”], set2=[“z”,”p”]. Hint: find the intersection, and then remove the intersection from each set.

3

Page 4: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

Section 21.2.2 – The LinkedHashSet Class

This section is optional and will not be tested on.

1. LinkedHashSet is a subclass of HashSet as shown in the class diagram on the right. LinkedHashSet is identical to HashSet except that the order of insertion is preserved. For example:

private static void testLinkedHashSet() { Set<String> names = new LinkedHashSet<>(); names.add("cat"); names.add("dab"); names.add("fia"); names.add("fre"); names.add("gor"); names.add("pet");

for(String name : names) { System.out.print(name + " "); }}

Output: cat dab fia fre gor pet

Section 21.2.3 – The TreeSet Class (TreeSet of Primitives, using only Collection Interface Methods)

1. The TreeSet class is an implementation of the Set interface as shown in the class diagram on the right. A TreeSet is an ordered set where elements are ordered according to Comparable or Comparator. A TreeSet can be created with no arguments, a Collection, or a Comparator.

2. Example – A TreeSet of Strings

TreeSet<String> tsCities = new TreeSet<>(Arrays.asList("New York", "Atlanta",

"Savannah", "Tampa", "Durango"));

for(String city : tsCities) {System.out.print(city + " ");

}

Output: Atlanta Durango New York Savannah Tampa

Homework

7. Write a method, getWordsAlphabetic which accepts a comma-delimited string of words, words and returns a set of of unique words ordered alphabetic. For example:

getWordsAlphabetic("the,world,is,good,a,world,cat,is")

returns the set: [a, cat, good, is, the, world].

8. Write a method, getDomainsAlphabetic which accepts a comma-delimited string of email addresses and returns a list of unique domains alphabetically. The format for an email address is: local-part@domain. For

4

Page 5: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

example, if the input is:

"[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]

then the returned list is:

[ant.edu, gmail.com, inorbit.com, run.com, run.edu]

Hint: very similar to the last problem except that you have to parse off the domain, and return a list instead of a set.

Section 21.2.3 – The TreeSet Class (TreeSet of Custom Objects)

1. A TreeSet can hold instances of a custom class provided the class implements Comparable or a Comparator exists which is supplied in the constructor.

2. Example – A TreeSet of Employees.

a. Suppose we have an Employee class:

public class Employee {private String lName;private String fName;private int ssn;private double salary;

public Employee(String lName, String fName, int ssNum, double salary) {...

}...

}

b. And a comparator for employees that compares last and first name:

public class EmployeeNameComparator implements Comparator<Employee> {public int compare( Employee e1, Employee e2 ) {

int diff = e1.getLastName().compareTo(e2.getLastName());

if( diff != 0 )return diff;

elsereturn e1.getFirstName().compareTo(e2.getFirstName());

}}

c. Then, we can create a TreeSet of Employees using the comparator:

Employee e1 = new Employee("Boggs", "Kay", 716533892, 12.57);Employee e2 = new Employee("Lyton", "Ben", 476227851, 77.88);Employee e3 = new Employee("Boggs", "Amy", 553572246, 22.32);Employee e4 = new Employee("Dern", "Donald", 243558673, 23.44);

TreeSet<Employee> empsByName = new TreeSet<>(new EmployeeNameComparator());

5

Page 6: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

empsByName.add(e1); empsByName.add(e2); empsByName.add(e3); empsByName.add(e4);

System.out.println("\nSorted on name");for(Employee e : empsByName) {

System.out.println(e);}

Output:

(Boggs, Amy - 553572246, 22.32)(Boggs, Kay - 716533892, 12.57)(Dern, Donald - 243558673, 23.44)(Lyton, Ben - 476227851, 77.88)

d. Then, we can create a TreeSet of Employees using a comparator to compare employees based on their SSN:

TreeSet<Employee> empsBySsn = new TreeSet<>(new EmployeeSSNComparator());

e. Add use the addAll method to add the employees from empsByName:

empsBySsn.addAll(empsByName);

System.out.println("\nSorted on SSN");for(Employee e : empsBySsn) {

System.out.println(e);}

Output:

(Dern, Donald - 243558673, 23.44)(Lyton, Ben - 476227851, 77.88) (Boggs, Amy - 553572246, 22.32)(Boggs, Kay - 716533892, 12.57)

3. How do we see if a TreeSet contains a custom object (or remove one)? As we saw in Labs 9 and 10, we use a dummy object. For example, suppose we have a TreeSet of Employee objects:

TreeSet<Employee> tsEmployees = new TreeSet<Employee>( new EmployeeSSNumComparator() );

tsEmployees.add( new Employee("Green", "Xavier", 338290448, 45.99);...

and we want to remove an Employee and we only know the SSN. We can create a dummy Employee object with the SSN and made-up values for name and salary (or provide a constructor that only takes the SSN).

Employee dummy = new Employee("Doe", "Jane", 338290448, 0.00); tsEmployees.remove(dummy));

The remove method uses the Comparator to try to find an Employee object in the TreeSet that matches (equals) the dummy based.

6

Page 7: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

Homework

None right now!

Section 21.3 – Performance of Sets & Lists

1. The author did a comparison of sets and lists doing 50,000 calls to contains and remove. The results were timed (milliseconds) and are shown below.

HashSet LinkedHashSet TreeSet ArrayList LinkedListcontains() 20 27 47 39,802 52,197remove() 27 26 34 16,196 14,870

Section 21.4 – Case Study: Counting Keywords

1. The CountKeywords.java program presented in this section is a neat, simple example of using a set.

Problem: count the total number of occurrences of Java keywords (i.e. abstract, assert, boolean, etc.) in a Java text file.

Algorithm:

Create HashSet with all Java keywords (i.e. abstract, boolean, etc.)Loop over all words in file

If word is in keywords setIncrement count

2. Let’s modify the problem slightly: Write a method that accepts a File object and returns the the count of the total number of occurrences of Java keywords in a Java text file.

public static int countKeywords(File file) throws Exception {// Array of all Java keywords + true, false and nullString[] keywordString = {"abstract", "assert", "boolean",

"break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "enum", "extends", "for", "final", "finally", "float", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "try", "void", "volatile", "while", "true", "false", "null"};

Set<String> keywordSet = new HashSet<String>(Arrays.asList(keywordString)); int count = 0; Scanner input = new Scanner(file);

while (input.hasNext()) { String word = input.next(); if (keywordSet.contains(word)) count++; }

7

Page 8: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

input.close(); return count;}

How would you modify the solution above to:

a. Return a set of the distinct keywords in the order they occur?b. Return a set of the distinct keywords in alphabetical order?

3. The solution above has at least three problems:

a. It counts keywords in comments.b. It counts keywords in string literals.c. It does not count the situation where there is no space after a keyword. For example, if(ssNum==0)

ssNum++; does not count the “if”.

Exercise 21-3 is an extension of this problem and it is rated a double-star (**) problem. Actually, it should be triple-star. There, the author asks you to consider just a and b above (he didn’t mention c). I looked at the author’s solution and it doesn’t completely do the job.

4. There are companies that specialize in parsing data: converting data from one format to another, or extracting data from multiple sources: pdf, word, etc. I know someone who worked on a project where the code scanned pdfs, used OCR to convert to text, and then extracted the data and did stuff with it.

8

Page 9: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

Section 21.2.3 – The TreeSet Class (SortedSet & NavigableSet Methods)

This section is optional and will not be tested on.

1. Above, we showed that TreeSet is-a Set. Actually, there are two interfaces in between (as shown in the class diagram on the right) that prescribe methods that provide access to certain elements.

Method Descriptionfirst():E The first (smallest) element is

returnedlast():E The last (largest) element is returnedheadSet(to:E) :SortedSet<E>

Returns a SortedSet of elements that are strictly less than toElement. {x|x<toElement}

tailSet(from:E) :SortedSet<E>

Returns a SortedSet of elements greater than or equal to fromElement. {x|x>=fromElement}

subSet(from:E,to:E) :SortedSet<E>

Returns a SortedSet of elements between fromElement, inclusive to toElement exclusive. {x|fromElement <= x < toElement}

a. See Lab 10 for examples of these methods.

b. Note that headSet, tailSet, subSet return a SortedSet. This is an odd structure. Consider the documentation for the headSet method:

Returns a view of the portion of this set whose elements are strictly less than toElement. The returned set is backed by this set, so changes in the returned set are reflected in this set, and vice-versa.

Thus, there may be situations where you might want to create a TreeSet from the SortedSet in order to break this bond. In other words, you want to preserve the result of headSet and then subsequently modify the TreeSet or vice-versa.

2. Next, we consider a few of the methods specified in the NavigableSet interface. The first four below) are similar to the methods in SortedSet except that they return a single item (or nothing).

Method Descriptionfloor(e:E) The largest element <= e is returnedlower(e:E) The largest element < e is returnedceiling(e:E) The smallest element >= e is returnedhigher(e:E) The smallest element > e is returnedpollFirst() Returns the smallest element and removes itpollLast() Returns the largest element and removes it*headSet(to:E,in:bool):NavigableSet<E>

Returns elements {x|x<=to} , when

9

Page 10: CS 1301 – Ch 6, Handout 1 · Web viewEssentially a constant amount of time no matter how many items are in the set. ... gamma ”}. Hint: use a . HashSet. and before returning,

in=true *tailSet(from:E,in:bool):NavigableSet<E>

Returns elements {x|x>=from} , when in=true

*subSet(to:E,in1:bool,from:E,in2:bool) :NavigableSet<E>

Returns elements {x|from<=x<=to} , when in1=true and in2=true

*descendingIterator():Iterator<E> Returns an iterator over the elements in this set, in descending order.

*descendingSet():NavigableSet<E> Returns a reverse order view of the elements contained in this set.

* Not shown in class diagram above.

Similar to SortedSet above, NavigableSet is a view of the underlying set and changes to either are reflected in the other.

10