<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://mikmaks97.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://mikmaks97.github.io/" rel="alternate" type="text/html" /><updated>2026-03-10T02:50:08+00:00</updated><id>https://mikmaks97.github.io/feed.xml</id><title type="html">Max is curious</title><subtitle>Curious about networks, geography, video games, sustainability, Muay Thai, chess, and rock climbing. I&apos;m constantly learning and understanding how little I know and how broad and deep the world is. I&apos;m also always trying to support and make positive change. Hope to spread smiles and love with my work.</subtitle><entry><title type="html">Does software design matter?</title><link href="https://mikmaks97.github.io/tech/2025/05/07/software-design.html" rel="alternate" type="text/html" title="Does software design matter?" /><published>2025-05-07T20:20:00+00:00</published><updated>2025-05-07T20:20:00+00:00</updated><id>https://mikmaks97.github.io/tech/2025/05/07/software-design</id><content type="html" xml:base="https://mikmaks97.github.io/tech/2025/05/07/software-design.html"><![CDATA[<p>Does it make a difference? Can spending time on it be justified?</p>

<p>It’s hard to argue if it matters for a company directly, if it affects its bottom line, which is by and large the main question for most companies. While an argument can be made that thoughtful design makes it easier and faster to modify, extend, and scale software, it’s hard to measure how much this impacts a company’s profit versus an approach of carelessly written code due to the lack of quantifiable evidence.</p>

<p>On the other hand, building as fast as possible with minimal resource expenditure seems like a no-brainer for the goal of maximizing profit, but these tactics directly conflict with the slower, more long-term approach of software design. Does software design then actually hurt profit by standing in contradiction to these tactics, and can we conclude that software design shouldn’t ever be considered because it makes no tangible difference and only hurts company outcomes? Is there maybe some example of when software design does make a positive difference in outcome that we can use as a starting point for rationalizing its existence?</p>

<p>The first thing that comes to mind is contract work. When building software for a client, there’s an understanding that, once built, you hand over the software to the client. The client inherits the software, so if it’s not designed well, they inherit its fragility, rigidity, and issues. In this case, good design has a direct benefit on the long-term outcome for the software and the client inheriting it.</p>

<p>Abstracting this example a bit, it seems anytime software is inherited or changes hands, good design is a worthwhile investment. So then, when does software change hands? Is it only in the case of contract work? It’s obviously much broader than that: software inheritance happens when a new developer onboards to a team, a developer is assigned to extend/modify the functionality of existing software that they didn’t build, an engineer on support needs to understand a piece of software to triage a bug, and so on. Even these few examples occur frequently enough to make a case for spending time on good software design, because on the side of well-designed software the experiences of the people affected by it are intuitive, simple, pain-free, and empowering, and on the other side, they’re frustrating, demotivating, difficult, and never-ending. This makes it feel like good software design isn’t just a self-indulgent activity but makes a real impact, no matter the company or product.</p>

<p>While it may not matter for the company’s goals directly, good design matters to the people working with and inheriting software. The fellow engineers who don’t have to wake up to alerts, or spend weeks reverse engineering a system with no documentation in order to extend its functionality, or deep dive to understand a system to triage some bug, or newcomers who have a faster and easier time onboarding, or the entire development team who is able to build faster and with more confidence. It may not make a direct impact on profit, but it makes an impact on the number of headaches for years to come. If we assume that empathy is a worthwhile value to uphold, then the benefits of good software design are self-evident, and maybe empathy becomes a measure of how good a design is. The more people a design impacts, the better it is.</p>

<p>These smaller impacts of good software design build up to the larger nurturing of a tech culture of collective ownership and consensus, a culture ultimately rooted in empathy. The question then becomes, do empathy and healthy culture matter? While I can’t speak for a company’s leadership, I can say for myself that empathy is a worthwhile value and aspiration and has a ripple effect on the people around me, so I will continue to fight for and champion good software design as one way of serving and supporting my fellow peers and making the impact I can.</p>]]></content><author><name></name></author><category term="tech" /><summary type="html"><![CDATA[Does it make a difference? Can spending time on it be justified?]]></summary></entry><entry><title type="html">3D Models and Animations</title><link href="https://mikmaks97.github.io/art/2021/11/23/models.html" rel="alternate" type="text/html" title="3D Models and Animations" /><published>2021-11-23T20:20:00+00:00</published><updated>2021-11-23T20:20:00+00:00</updated><id>https://mikmaks97.github.io/art/2021/11/23/models</id><content type="html" xml:base="https://mikmaks97.github.io/art/2021/11/23/models.html"><![CDATA[<p>Here is a short curation of 3D modeling and animation work I did in <a href="https://blender.org">Blender</a>.</p>

<p><img src="/assets/images/art/gamepad.png" alt="Gamepad" /></p>
<center><i>Candy Controls</i></center>
<p><br />
<img src="/assets/images/art/skyline.bmp" alt="Skyline" /></p>
<center><i>NYC Skyline</i></center>
<p><br />
<img src="/assets/images/art/icecream.png" alt="Ice Cream" /></p>
<center><i>Ice Cream</i></center>
<p><br />
<img src="/assets/images/art/lilies.png" alt="Lillies" /></p>
<center><i>Lillies</i></center>
<p><br />
<img src="/assets/images/art/train.png" alt="Train" /></p>
<center><i>Subway</i></center>
<p><br />
<img src="/assets/images/art/dandies.png" alt="Dandelions" /></p>
<center><i>Dandies</i></center>
<p><br />
<img src="/assets/images/art/shadow.png" alt="Me" /></p>
<center><i>Me</i></center>
<p><br /></p>

<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/ZzBvvt75YvI" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></center>
<center><i>Two-part tutorial on making an ice cream cone in Blender</i></center>
<p><br /></p>
<center><iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/DIsKv581MzQ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe></center>
<center><i>Minecraft animation made in Blender</i></center>]]></content><author><name></name></author><category term="art" /><summary type="html"><![CDATA[3D models made in Blender]]></summary></entry><entry><title type="html">Whimsical Digital drawings</title><link href="https://mikmaks97.github.io/art/2021/11/22/drawings.html" rel="alternate" type="text/html" title="Whimsical Digital drawings" /><published>2021-11-22T20:20:00+00:00</published><updated>2021-11-22T20:20:00+00:00</updated><id>https://mikmaks97.github.io/art/2021/11/22/drawings</id><content type="html" xml:base="https://mikmaks97.github.io/art/2021/11/22/drawings.html"><![CDATA[<p>Just two drawings I made in Photoshop a while ago :)</p>

<p><img src="/assets/images/art/girlandboy.png" alt="Girl and Boy" /></p>
<center><i>Girl and Boy</i></center>
<p><br /></p>

<p><img src="/assets/images/art/blueberryfirefly.jpg" alt="Blueberries and firefly" /></p>
<center><i>Blueberries and Firefly</i></center>]]></content><author><name></name></author><category term="art" /><summary type="html"><![CDATA[Some digital drawings I made in Photoshop]]></summary></entry><entry><title type="html">CU-me Class Scheduler</title><link href="https://mikmaks97.github.io/tech/2021/11/21/cume.html" rel="alternate" type="text/html" title="CU-me Class Scheduler" /><published>2021-11-21T20:20:00+00:00</published><updated>2021-11-21T20:20:00+00:00</updated><id>https://mikmaks97.github.io/tech/2021/11/21/cume</id><content type="html" xml:base="https://mikmaks97.github.io/tech/2021/11/21/cume.html"><![CDATA[<p><a href="https://github.com/mikmaks97/cu-me">Project Link</a></p>

<h1 id="status">Status</h1>

<p>This project used to be live and publicly accessible but is currently archived.</p>

<h1 id="description">Description</h1>

<p>By senior year of college I had honed my class scheduling workflow. I used it every semester and it spanned four browser tabs:</p>
<ol>
  <li>CUNY DegreeAudit to see what courses I still need to take in my college career, including core and major requirements.</li>
  <li>CUNYFirst for next semester’s course catalog. I mainly used the course search and primarily needed course session dates, times,
and session professors.</li>
  <li>RateMyProfessors for professor ratings. When choosing a session I needed to factor which professor was teaching it in addition
to the session date/time because, when possible, it was worth comprimising on date for a better instructor.</li>
  <li>Free College Schedule Maker for a simple interface to experiment with different schedules and keep track of chosen sessions.</li>
</ol>

<p>Working across many tabs and experiencing CUNYFirst’s objectively bad user experience made the class scheduling process a
cumbersome chore. With the skills I had accumulated by my last year of college, I decided to make a web app that integrated
these three services in one experience.</p>

<hr />

<p><strong>Say hi to CU-me.</strong></p>

<p>CU-me is a one-stop for class scheduling for all CUNY students. It works for all 22 campuses for undergraduate and graduate
students. Using your CUNY login you get access to a single-page application with degree requirements on the left side,
course search on the right side, and a schedule canvas in the middle. Course search results include RateMyProfessors ratings.
Underlying the visual experience is a publicly queryable REST API for course searching. Without further ado, here’s a gallery:</p>

<p><img src="/assets/images/cume/login.png" alt="Login page" /></p>
<center><i>Login page</i></center>
<p><br /></p>

<p><img src="/assets/images/cume/tut.png" alt="First-time tutorial modal" /></p>
<center><i>First-time tutorial modal</i></center>
<p><br /></p>

<p><img src="/assets/images/cume/main.png" alt="Main page" /></p>
<center><i>Main interface</i></center>
<p><br /></p>

<p><img src="/assets/images/cume/rmp.png" alt="Search results with RateMyProfessors ratings" /></p>
<center><i>Course search results with RateMyProfessors ratings</i></center>
<p><br /></p>

<p><img src="/assets/images/cume/class.png" alt="Class added to schedule" /></p>
<center><i>A course session added to the schedule</i></center>
<p><br /></p>

<h1 id="tech-writeup">Tech writeup</h1>

<p>This is a Django application with a PostgreSQL database.</p>

<h2 id="database">Database</h2>
<p>The database stores a user’s saved schedule by linking a Django user with selected courses.
User actions trigger background AJAX requests, which update database entries, so changes are automatically saved.</p>

<p><img src="/assets/images/cume/db.png" alt="Database diagram" /></p>
<center><i>Database diagram</i></center>

<h2 id="api">API</h2>
<p>The REST API has one GET endpoint for CUNY class searching. It wraps the CUNYFirst search API without requiring login.</p>

<p>I tried finding various ways to query CUNYFirst without using an automated browser, but could only find an unauthenticated
page that allows interactive search. I use Selenium to interact with the form on this page filling in values from the GET
params of the search request. I then scrape, format, and return the results as JSON.</p>

<h2 id="frontend">Frontend</h2>
<p>When a user logs in with their CUNY credentials, I use them to log into DegreeWorks using Selenium and scrape all degree
requirements that I then present on the main screen. This is also when the user’s schedule data is fetched from the database
or initialized if the user is new. The Selenium process takes a long time, so I utilize Celery, an asynchronous
task queue, to run it as a background task with progress status updates to the user. The login process still takes a long time,
but progress is communicated transparently to the user.</p>

<p>The main schedule consists of multiple overlaid HTML canvases with mouse event listeners. The schedule can be downloaded as an
image at the click of a button.</p>

<hr />

<p>A more complete tech description is available <a href="https://github.com/mikmaks97/CU-me/blob/master/TECH_README.md">here</a>.</p>]]></content><author><name></name></author><category term="tech" /><summary type="html"><![CDATA[A class scheduling web app for CUNY students focused on a simple, informative user experience.]]></summary></entry><entry><title type="html">Enterprise-scale Firewall Decision Trees</title><link href="https://mikmaks97.github.io/tech/2021/11/18/firewall-decision-trees.html" rel="alternate" type="text/html" title="Enterprise-scale Firewall Decision Trees" /><published>2021-11-18T20:20:00+00:00</published><updated>2021-11-18T20:20:00+00:00</updated><id>https://mikmaks97.github.io/tech/2021/11/18/firewall-decision-trees</id><content type="html" xml:base="https://mikmaks97.github.io/tech/2021/11/18/firewall-decision-trees.html"><![CDATA[<p>One of my highlight projects at Bloomberg was building a tool to conduct contextual firewall policy comparison. I ended up
modifying the construction algorithm for an existing data structure, the firewall decision tree, for it to efficiently
process enterprise-scale policies. This is a generalized overview of my work starting with an overview of the domain,
terminology, and problem. Then, I go over various approaches culminating in the most efficient solution. A discussion
of solution pitfalls and further work concludes this post.</p>

<h2 id="a-brief-primer-on-firewall-policies">A brief primer on firewall policies</h2>
<h3 id="definition-and-representation">Definition and representation</h3>
<p>Firewall policies are collections of rules that state what action to take for certain flows. A flow is a 5-tuple consisting
of <code class="language-plaintext highlighter-rouge">(source, destination, protocol, source-port, destination-port)</code> and some examples of actions are <code class="language-plaintext highlighter-rouge">allow</code> and <code class="language-plaintext highlighter-rouge">block</code>.</p>

<p>I will use CIDR notation for the source and destination IPs/subnets.
Most policy rules do not contain source ports, so I will omit them in examples, instead writing a 4-tuple
<code class="language-plaintext highlighter-rouge">(source, destination, protocol, destination-port)</code>.</p>

<p>An example of a rule:
<code class="language-plaintext highlighter-rouge">(1.1.1.0/24, 2.2.2.0/24, tcp, 80) block</code></p>

<p>This rule blocks traffic from <code class="language-plaintext highlighter-rouge">1.1.1.0/24</code> to <code class="language-plaintext highlighter-rouge">2.2.2.0/24</code> on <code class="language-plaintext highlighter-rouge">tcp-80</code>, i.e. <code class="language-plaintext highlighter-rouge">http</code>.</p>

<p>Usually, to declare rules more efficiently, groups are defined, but no matter how rules are
represented, they can be broken down into simple 5-tuples. For example, the composite rule</p>

<p><code class="language-plaintext highlighter-rouge">([1.1.1.0/24, 3.3.3.3], 2.2.2.0/24, tcp, [80, 443]) block</code>, which means <em>block http and https traffic from
1.1.1.0/24 and 3.3.3.3 to 2.2.2.0/24</em> can be broken down into four simple rules:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(1.1.1.0/24, 2.2.2.0/24, tcp, 80) block
(1.1.1.0/24, 2.2.2.0/24, tcp, 443) block
(3.3.3.3, 2.2.2.0/24, tcp, 80) block
(3.3.3.3, 2.2.2.0/24, tcp, 443) block
</code></pre></div></div>

<p>Breaking down composite rules into simple rules can be done via Cartesian product:</p>

<p>\(sources \times destinations \times protocols \times destination\_ports\).</p>

<h3 id="processing">Processing</h3>
<p>Firewall rules are processed in order for matches. When a packet/flow comes in the first matching rule decides what action to take.</p>

<p>For example, for the set of rules in the previous example the flow <code class="language-plaintext highlighter-rouge">(1.1.1.1, 2.2.2.2, tcp, 443)</code> will be matched by the 2nd rule:
<code class="language-plaintext highlighter-rouge">(1.1.1.0/24, 2.2.2.0/24, tcp, 443) block</code>, so the flow will be blocked.</p>

<h2 id="problem">Problem</h2>
<p>A consequence of the rules being processed in order are shadowing. Shadowing is a form of redundancy when one rule
partially or fully overlaps the flows of another rule.</p>

<p>An example of shadowing:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(1.1.1.0/24, 2.2.2.0/24, tcp, 80) allow
(1.1.1.1, 2.2.2.2, tcp, 80) block
</code></pre></div></div>

<p>Packets destined to 2.2.2.2 from 1.1.1.1 on tcp-80 will be allowed despite rule 2 stating they should be blocked because
that flow will be matched by rule 1.</p>

<p>For the shadowing of one rule by another to occur there has to be an intersection between each of the fields for the two rules,
so the following example will not cause shadowing since the d-ports are different:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(1.1.1.0/24, 2.2.2.0/24, tcp, 80) allow
(1.1.1.1, 2.2.2.2, tcp, 443) block
</code></pre></div></div>

<p>Shadowing can cause unintended changes when adding/removing/changing rules in a policy as the true effects of the change
depend on the rule’s position and may not be visible via a surface-level comparison, especially when there are tens of
thousands of rules, whose flow values may be abstracted behind objects with names, which is common to make rule management
more understandable and intentful. Deleting a rule seems like a simple operation, but it may uninentionally expose the flows
below it. Similarly, adding a rule may not only be ineffective if that rule is shadowed by preceding rules, it too can
uninentionally shadow rules unless it is appended at the end of the list of rules.</p>

<p>A second consequence of shadowing is unoptimized policy, and as the number of rules grows longer it not only makes
managing the policy cumbersome and more error-prone, but the policy size may exceed the storage capacity of some equipment like
TCAMs, which have very limited storage capacity to maintain packet-processing speed.</p>

<p>Implementing a tool to do a flow comparison betwen two policies and return the set of unshadowed flows present in both policies,
whose actions differ across the policies, is the subject of the solution.</p>

<h2 id="solution">Solution</h2>
<p>The task of comparing two policies comprises two steps:<br />
  a. Comparing intersecting flows between the policies for differences in action<br />
  b. Reducing the differing flows to only unshadowed ones</p>

<p>These steps need not be completed in this order; step b) can precede step a) by first reducing the set of flows for
each policy to unshadowed only and then comparing these flows with each other.</p>

<h3 id="a-naive-approach">A naive approach</h3>
<h4 id="implementation">Implementation</h4>
<p>An intuitive approach to attaining the list of unshadowed flows for a policy is to start with an empty set of unshadowed
flows, iterate over each rule in order, and subtract the set of unshadowed flows from the flows of this rule.
The resulting flows can be added to the set of unshadowed flows.</p>

<p>The time complexity of this solution is \(O(N^2)\) in the total number of flows. For each rule, we subtract at most \(N\) flows.</p>

<p>For two interscting flows, subtracting one flow from another involves subtracting the source CIDRs and destination CIDRs.
CIDR subtraction is a time-consuming bit operation, which can yield more disjoint CIDRs than started with, the product of which
yields more flows.</p>

<p>As an example, consider subtracting the flows (protocol, port omitted): <code class="language-plaintext highlighter-rouge">(1.1.1.0/30, 1.1.10/30)</code> and <code class="language-plaintext highlighter-rouge">(1.1.1.1, 1.1.1.1)</code>.
<code class="language-plaintext highlighter-rouge">(1.1.1.0/30, 1.1.1.0/30) - (1.1.1.1, 1.1.1.1) = (1.1.1.0, 1.1.1.0/30), (1.1.1.1, 1.1.1.0), (1.1.1.1, 1.1.1.2/31), (1.1.1.2/31, 1.1.1.0/30)</code>.</p>

<p>The subtraction of two flows resulted in four new disjoint flows. The problem size expands rapdidly with each successive subtraction.</p>

<h4 id="improvements">Improvements</h4>
<p>There are several improvements we can make for each rule to improve the performance of this apprach.</p>

<p>Instead of subtracting all unshadowed flows from the current rule’s flow, some of which do not overlap with the flow,
we can prune this set beforehand. We achieve this by using radix trees to store the unshadowed sources and destinations.
This way instead of linearly scanning all the flows, we can narrow down the overlapping sources and destinations in \(log(N)\)
time.</p>

<p>Since we must also consider that the protocols and d-ports overlap, we can bucket the flows by (protocol, d-port) pair,
all flows for one (protocol, d-port) being in the same bucket. Now, for a given rule flow we only need to process the flows
in the flow’s (protocol, d-port) bucket [if such a bucket exists]. And of those flows, we only need to subtract the ones,
whose sources and destinations overlap with the given flow’s sources and destinations, which we find using the radix trees.</p>

<p>The amount of CIDR subtraction done for each flow is greatly reduced with these methods and there is an added benefit of
allowing parallelization. Previously, since each rule’s unshadowed flows depended on the preceding flows, we could not
parallelize the work, but, now, we can bucket all rule flows by (protocol, d-port) in advance and then process all of the buckets
in parallel since there is no overlap between buckets. The rule flows within a bucket must still be processed synchronously.</p>

<p>Unfortunately, the above methods do not improve the asymptotic runtime of the solution and do not reduce enterprise policy
processing time enough for practical use. One simple worst-case scenario that occurs frequently is the rule flow
<code class="language-plaintext highlighter-rouge">(0.0.0.0/0, 0.0.0.0/0, all, all)</code>. This usually appears at the end of a policy, the final “default drop” rule. This flow
overlaps with every flow and since it is at the end we must subtract all the unshadowed flows from it. This single operation
takes an impractical amount of time to seriously consider this solution.</p>

<h3 id="using-firewall-decision-trees">Using firewall decision trees</h3>
<h4 id="original-structure-and-algorithms">Original structure and algorithms</h4>
<p>To solve this problem I conducted an extensive literature review for existing solutions. A series of papers (Liu and Gouda 1237)
described a data structure called the firewall decision tree [FDT], which had several uses:</p>
<ol>
  <li>Policy comparison</li>
  <li>Policy optimization (shadowing analysis)</li>
  <li>Policy testing (more efficient flow matching and coverage analysis)</li>
</ol>

<p>This was a promising approach with benefits beyond solving this problem.</p>

<p>Using the FDT for policy comparison described in the paper:</p>
<ol>
  <li>Construct an FDT for each policy</li>
  <li>Find differing flows between the policies by merging the constructed FDTs</li>
</ol>

<p>The FDT construction process removes shadowed flows. It is more efficient than the aforementioned naive approach. The key to
its efficiency is representing the fields of a flow as numeric intervals, <code class="language-plaintext highlighter-rouge">[start, end]</code> (Liu and Gouda 1241).
Instead of representing sources and destinations as CIDRs, they are represented as numerical intervals in the range 0-2^32.
Protocols can be represented numerically in the range 0-255 via <a href="https://iana.org/assignments/protocol-numbers/protocol-numbers.xhtml">this mapping</a>.
Finally, ports are numbers in the range 0-2^16.</p>

<p>The example flow <code class="language-plaintext highlighter-rouge">(1.1.1.0/24, 2.2.2.0/24, tcp, 80)</code> becomes <code class="language-plaintext highlighter-rouge">([16843008, 16843263], [33686016, 33686271], 16, 80)</code>.</p>

<p>Interval subtraction is less costly than CIDR subtraction. Indeed, no arithmetic or bit manipulation is needed to subtract two intervals:
<code class="language-plaintext highlighter-rouge">[1, 10] - [2, 4] = [1, 1], [5, 10]</code>. For a set of overlapping intervals, we can use <a href="https://www.geeksforgeeks.org/merging-intervals/">this algorithm</a>
to merge them while removing overlaps. The runtime complexity is \(O(NlogN)\) in the number of intervals.</p>

<p>The result of the construction algorithm is a tree whose levels comprise flow field intervals, i.e. level 1 is source intervals,
level 2 is destination intervals, level 3 is protocol intervals, and level 4 is destination port intervals.
At the end of each tree path is an action, so the actions can be considered the last level and they make up the leaves of the tree.
Hence, every path of the tree is a flow, and the total number of unshadowed flows is equal to the number of paths/leaves of the tree.</p>

<p><img src="/assets/images/fdt/fdt.PNG" alt="Firewall decision tree visualized" />
<em>A visual representation of a firewall decision tree. Internally protocols are stored as numeric intervals with equal start and end.</em></p>

<p>The construction is described on page 1243 of Liu and Gouda, but I have abridged it here:</p>
<ol>
  <li>Append each rule flow in order into the tree. Start from the tree’s root level of source intervals.</li>
  <li>Merge flow’s intervals with overlapping tree’s intervals and create nodes for the resulting disjoint intervals accordingly.</li>
  <li>Copy old interval subtrees to new interval nodes.</li>
  <li>Proceed to the children (next level) of interval nodes covered by the flow’s intervals.</li>
  <li>Repeat steps 2-3 until entire flow is merged.</li>
</ol>

<p>Once FDTs for both policies are constructed, the paper describes the comparison algorithm, which consists of making the trees
isomorphic to each other to make them comparable. Essentially, at the end of this process, all the flows in one tree exist
in the other with potentially a different action (so, in fact, the trees are semi-isomoprhic).
It is then a matter of iterating over all the flows to find ones with differing actions (Liu and Gouda 1245).</p>

<hr />

<p>There are several pitfalls to this algorithm in its current state:</p>
<ol>
  <li>Deep copying subtrees is time-consuming.</li>
  <li>Constructing an FDT for each policy and then shaping them to be semi-isomorphic is time-consuming.</li>
  <li>There is no room for parallelization in FDT construction due to the synchronous [appending new paths to the tree] nature of the construction algorithm.</li>
</ol>

<p><img src="/assets/images/fdt/vert.gif" alt="Vertical construction algorithm" />
<em>Vertical construction algorithm for a firewall policy. Compare with horizontal below.</em></p>

<p>Even though the overall efficiency of this solution is superior to the naive method present above (complexity analysis
can be found at Liu and Gouda 1247), the policies it was benchmarked on (Liu and Gouda 1249) have orders of magnitude fewer rules
than the enterprise policies I was working with, so I needed to rethink portions of the algorithm to improve performance.</p>

<h4 id="new-modifications-to-improve-performance">New modifications to improve performance</h4>
<p>My contribution to the solution was redesigning the implementation of this method to avoid any copying,
avoid making the trees isomorphic to each other altogether, and utilize parallelization.</p>

<p><strong>Modification 1:</strong> Instead of constructing two FDTs, one for each policy, and then making them isomorphic to each other, I immediately
construct a combined tree that includes the flows of both policies. In this tree, flows have two action leaves, one from each policy.
If only one of the policies contains a flow, the other action is empty. Since the construction algorithm is responsible for the majority
of the runtime, this signficantly improves performance and space.</p>

<p><strong>Modification 2:</strong> Instead of following a vertical construction approach of merging one flow after another to the tree, I use
a horizontal level-based approach. I merge all the source intervals of all the flows first, then, for each source interval,
merge all the destination intervals, then protocols, then ports. This simultaneously avoids subtree copying and opens up
the potential to parallelize.</p>

<p><img src="/assets/images/fdt/horiz2.gif" alt="Horizontal construction algorithm" />
<em>Horizontal construction algorithm for the same policy. Compare with vertical above.</em></p>

<p>The result of the construction algorithm is a tree, whose differing flows are trivial to find by iterating over all flows
to filter ones with more than one action leaf and whose actions differ.</p>

<p>Further speed enhancements can be made by parallelizing construction and storing interval merging results in a cache so as
not to solve the same interval merging problem more than once because intervals are often repeated, especially for protocols
and ports, both of which have small sets of commonly used values. Target protocols in policy rules are typically tcp (6),
udp (17), and icmp (1) and target ports are http (80), https (443), and dns (53). These auxiliary changes prove effective
in practice.</p>

<table>
  <thead>
    <tr>
      <th>Method</th>
      <th>Sources</th>
      <th>Destinations</th>
      <th>Protocols</th>
      <th>Ports</th>
      <th>Total</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>serial</td>
      <td>0.07</td>
      <td>8.87</td>
      <td>0.86</td>
      <td>4.38</td>
      <td>14.18</td>
    </tr>
    <tr>
      <td>targeted parallel</td>
      <td>0.07</td>
      <td>1.07 (parallel)</td>
      <td>0.86</td>
      <td>4.38</td>
      <td>6.38</td>
    </tr>
    <tr>
      <td>parallel and cache</td>
      <td>0.07</td>
      <td>1.07 (parallel)</td>
      <td>0.86</td>
      <td>1.19 (cache)</td>
      <td>3.19</td>
    </tr>
  </tbody>
</table>

<p><em>A summary of FDT + speed enhancements performance on an enterprise policy with ~100,000 rules [results in minutes].</em></p>

<h4 id="simplifying-the-tree">Simplifying the tree</h4>
<p>Merging overlapping intervals during the construction process can make the flows overly granular. Presenting results is
then cumbersome and the firewall decision tree takes up more space than necessary.</p>

<p><img src="/assets/images/fdt/unopt.PNG" alt="Unsimplified tree" />
<em>A tree with 24 flows even though all of the flows can be encapsulated by 2 flows with unions of intervals.</em></p>

<p>To simplify the reporting of differing flows and structure of the tree, we can union interval nodes whose subtrees are the same.
Starting at the bottom of the tree at the port level, if two port nodes have the same action leaves, we can union the
ports into one combined port node. Advancing up the tree to the protocol level, we check if any protocol nodes have equal subtrees,
in which case we union them. We continue in this manner all the way up to the tree root. The idea of comparing child contents in
this process is inspired by how Merkle trees work. The runtime complexity of this simplification algorithm is \(O(N)\) where
\(N\) is the number of nodes in the tree since we visit each node once.</p>

<p>The result is a simpler, more understandale set of flows.</p>

<p><img src="/assets/images/fdt/opt.PNG" alt="Simplified tree" />
<em>After simplification, the above tree reduces to 2 flows.</em></p>

<h2 id="discussion-and-further-work">Discussion and further work</h2>
<p>The described modifications to the firewall decision tree construction algorithm significantly improve performance and space.
They enable parallelization through level-based construction and simplify the presentation of results. These enhancements
make it feasible to conduct contextual firewall policy comparison at a massive scale.</p>

<p>A consequence of building a tree from the rules of multiple policies as described is that this process supports more than
two policies. In fact, each port node can have as many action leaves as there are policies, and the algorithm simply works.
The other aforementioned benefits of using firewall decision trees are maintained as well.</p>

<p>Further speed improvements could be made by preprocessing the flows before tree construction and splitting by (protocol, d-port)
combo as mentioned in the improvements for the naive implementation. Another approach to consider is merging overlapping
intervals for each field and then connecting intervals to their children to construct the levels of the tree.</p>]]></content><author><name></name></author><category term="tech" /><summary type="html"><![CDATA[Modifications to the firewall decision tree structure to improve performance for enterprise-scale firewall policies]]></summary></entry></feed>