# About this lab

Besides learning about data structures, CS2C will explore many algorithms. Some algorithms are used within the data structures and arise as natural support for those structures. Other algorithms are solutions to problems in their own right, and the data structures are used as a means to an end - as an expedient toward implementing the algorithm. As a taste of this, in this lab you will implement an algorithm that will make use of simple vector data structures. It is a solution to the Subset Sum Problem, though not necessarily the most efficient one.

If you're hoping to enroll in my section of CS 2C at Foothill, this will also be your first lab. You will submit it at the end of Week 1. It is my version of a lab originally created by Michael Loceff. It is one of my favorite labs, and I hope it will be yours too.

There are a total of 9 labs, with one due at the end of most weeks of the quarter. Each will be slightly more challenging than its predecessor.

Since we have to cover a lot of material quickly, we won't have time to review topics already covered in CS 2B. We'll be off to a running start. So I figured I should share the first lab specs with you even as you're considering enrolling.

The skills you need to complete this lab successfully are not great, but they are a baseline from which we'll start. That is, I'll take for granted that you're already comfortable with all the concepts you need for this lab at the beginning of 2C.

Calibrate your expected comfort and success level in 2C by attempting this lab on your own without outside help.

● If it takes you no more than a day to finish, you'll be at home in 2C.

● If it takes you up to a week, then you'll still be fine after reviewing the topics covered in CS 2B.

● If it takes longer, your 2C experience and likely success may be correspondingly affected. If you can't finish it on your own within a week, then you should do or review CS2B material first.

I hope you find this information helpful in designing a fulfilling and enriching path through the maze of course offerings.

Best of luck!

&

## The Subset Sum Problem

*by Michael Loceff*

A version of the subset sum problem is a pair (S, t), where S = {x1, x2, ... xn} is a set of positive integers and t (the target) is a positive integer. The problem asks for a subset of S whose sum is as large as possible, but not larger than t. The problem has many useful applications. In transportation, we may have a truck that can carry at most t pounds, and we wish to maximize the load based on a loading dock filled with packages with varying discrete weights. A radio show host or program manager may want to select group of tunes or commercials from a set whose total running time is as close to the duration of the show or commercial break (perhaps three hour FM music show, or a two minute commercial break) without going over the time allotment.

## The Algorithm - Simple Formulation

Here is what you can call a brute-force method:

1. Form all possible subsets (also called the "power set of S"). For example, if S = {4, 1, 7}, the power set of S is the set containing {} (the empty set), {4}, {1},{7}, {4, 1}, {1, 7}, {4, 7}, and {4, 1, 7}.

2. Find the subset whose members add up to the largest number possible <= t.

with just the new item. Then after the second item comes along, I can form two more sets in the same way, bringing the total to 4. Then 8, and 16, and so on. It's easy to see how this is simply the value 2N for N items. That's the number of subsets I can form from N items.

Now the cool thing is that we can follow this exact technique to iteratively generate our subsets. Start off with an empty set and, iterating through the set of all items, add each item to every set you already have to generate a whole bunch of * new *sets. They're

*in the sense that you haven't yet seen (or evaluated) any of them.*

*new*As you can see, this is a cool way to generate your subsets, but it does nothing for the running time of our search. It's still going to take time proportional to 2N to visit all the candidate sets.

What can we do to speed it up? Hmm… One way is to reject all unpromising searches as soon as we know for sure.

"How do we know" you ask? Well, you simply keep track of the running total (sum of elements) of eachset.

The moment you find a set whose total is greater than the target, you can immediately ignore all its * descendants *without even looking. By

*I mean every set of which this*

*descendants,**set is a subset.*

*unviable*Suppose you identify one such subset early on… say when you have M items still left to process. That means you've effectively pruned out 2M potential candidates without needing to explicitly test them. (Imagine this - Just two quarters ago, you were mind-blown by the power of binary search over linear. But if you think about it, you'll see that's exactly what's going on here at various levels of intensity. Every time you find an unviable set, all its descendants are gone.

Clearly, it makes sense to prune sets early in the process (when M is close to N), right? In the best possible case, you encounter such descendant-killers early in the lineage of a set - when it's only a few items big.

Note that every singleton set which is unviable effectively cuts the remaining search space in half. Intuitively this makes sense because it is the same as saying that you first scan all items in linear time and discard elements greater than the target. If you discarded K items, you have thereby just reduced your search space by 2K items.So it seems like this strategy should on average cut the search space by a factor of somewhere between 0.5 and some small number (which should be 1/2N). Knowing this you can put some definite upper and lower bounds of performance on this algorithm in your mind: It is at least as efficient as linear search and at most as efficient as binary search.

I think I can hear you say "This is really screaming out for an immediate optimization - Skip elements larger than the target while iterating over the elements" - Yeah, that's an awesome thumb rule. No doubt you will find more like that. But remember they are all just special cases of your overarching plan - * to discard unviable candidates as soon as you find them, and thereby prune the search space of all its descendants*. You can get extra credit by discussing in the forums such thumb-rules as you discover them.

**High-level plan**

**High-level plan**

Maintain a vector of viable candidates. Initially start out with just an empty set. Then gradually add viable sets to this list as you process each item from the master set in turn. At any time, if you find an exact match to the requested target value, you can output that set and end the search.

**Implementation specifics**

**Implementation specifics**

You need to create a template class called Set. So your list of candidates can simply be a std::vector<Set>. However, since your Set is a template class, I would expect to instantiate it using something like std::vector<Set<int>> or std::vector<Set<SongEntry>>, where I, as the user of your class decide what things I want sets of. It's up to me to make sure that if I use SongEntry as my template parameter, my SongEntry class behaves like an integer, in supporting the ability to be added to an integer to yield an integer.

Since this is going to be a template class, you'll submit a single source file, a file called Set.h. I'll give you the skeleton of the file, which also defines its API. You can add your own helper methods if you want, but don't change the signatures of the methods I've already given you. The portions marked TODO are the ones you need to complete. It's not a great deal of code, and probably just under a hundred lines. But you'll need to think through the problem carefully to write it.

**Further notes**

**Further notes**

1. You don't want to iterate over each candidate set to calculate the sum of its elements in real time. Instead, maintain a sum member in your Set class such that every Set object knows its own sum.

2. Since Set is a template class, you have no idea what I might templatize it with. For all you know my items may be large objects (e.g. an mp3 file may be contained within each SongEntry object). Thus you can't afford to make copies of each element in every set to which it belongs. Allow instead for your Set to have a pointer to the vector of actual objects stored separately. And then have a separate vector of elements which simply keeps indices of objects in the master list to represent their existence in a particular set. Every Set object instantiated with the same vector of objects should store the same pointer. Although the contents of this master list is vulnerable to change from outside (since it actually lives outside the class), trying to overcome this shortcoming by making it a static copy of the supplied master list doesn't seem to me to be such a good tradeoff. In the interests of learning the algorithm, we'll assume a certain level of intelligence and self-preservation instinct in the users of your class.

**Some useful tips (you'll be glad you read)**

**Some useful tips (you'll be glad you read)**

1. You'll have a nested loop in your search method: One over all the items in your master vector, and another over all the subsets in your vector of candidates. This inner loop does not have a growing upper limit if you end up adding subsets inside the loop. So make sure that at each pass over the inner loop, you only add direct descendants of the sets you started with. Recall that by * descendants of a set, *I mean all the sets you can obtain by adding one or more extra elements to it. A direct descendant is a descendant that differs from its parent by exactly one element.

2. You can and should test in your inner loop to see if you have added a new sub-list whose sum() == t because, if you have, you are done and should break out of the outer loop. Normally there will be zillions of subsets and we can usually reach the target long before we have tested them all.

Here is my skeleton code. You need to flesh it out and make sure it compiles. I also provide a test main.cpp (which you shouldn't submit).

Don't copy and paste the above code into your IDE. Besides the risk of pasting in invisible characters that throw you for a loop when debugging, weirdly enough, I've received a significant amount of feedback from past students who said that seeing and typing helps with understanding (even if you think you're not paying attention as you copy it).

Here is your sample main. Make sure to **not **submit it or your code won't build. This is just to get your own testing started:

4 of the 6 extra credit points are for passing optional tests beyond those worth the first 20 points. If you keep refining your code until all reported bugs have been removed, you get 24/20.

2 of the extra credit points are for meaningful and original discussions of the interesting related topics. They don't have to be posted in our Canvas forum. You can post them anywhere on the internet (E.g. Quora, Facebook, or your personal blog). But you must at least post a link to your content on Canvas and share a short summary. Your comment does not even have to be recent- just original and interesting.

**Some questions to get you started**

**Some questions to get you started**

1. Does a particular order of processing the master set elements work better than others (e.g. Does sorting the list of items help?)

2. Does the value of the target matter (i.e. are certain targets likely to be reached sooner)?

3. Does the nature of the probability distribution from which the master set elements are drawn matter? (e.g. Can the algorithm take advantage of the fact if you knew that the master set elements were drawn from a particular probability distribution (e.g. a Gaussian, uniform or exponential distribution, etc.)

**Self calibration for 2C**

**Self calibration for 2C**

If you complete this lab on your own, only looking for information and/or help from your friends (or online) to understand what to do next, you'll get a very good idea of how you will fare in 2C. The remaining labs will be of the same (or gradually increasing) level of difficulty as this one.

If you enjoyed working on this code without being frustrated by endless test failures, that means you will have a fantastic time in 2C.

Otherwise, you can still have a fantastic time by investing a little bit of effort in polishing up your knowledge of C++ before the quarter starts.

Happy Hacking,

&

## Comments