“…to upload files to any user, log in as any user, destroy any farm, run any farm off the road, upload whatever we want, download whatever we want, destroy any data, log in to any third party accounts. We could literally do whatever the heck we wanted with anything we wanted on the John Deere operation center, period.

Finally John Deere’s security vulnerabilities have come into my realm of attention and goodness it doesn’t look good, folks.

What?

DEFCON 29 featured a presentation by ethical hacker, Sick Codes, who managed to get access to John Deere’s access to the machines it sells with which he could…do really whatever he wanted with that equipment.

Here’s the YouTube video in question: https://www.youtube.com/watch?v=zpouLO-GXLo

Shoutout to the scope of the area this Aussie demo’ed. You got my home region of west central Illinois! There’s a lot of these machines out here. 

I want to break this down for non-sec folks to appreciate what the heck is happened.

DEFCON 29 happened about a week ago when these vulnerabilities started gaining traction for larger website publications like Vice

DEFCON == Conference for Hacking and Digital Security Enthusiasts. Everyone from the shy computer kid from your old high school to your in-law who works for the NSA goes to this thing. 

The hacking issues specifically relate to the autonomous vehicles from John Deere. Examples are like the John Deere 7450 ProDrive Forage Harvester.

So What?

Here’s some examples of what Sick Codes found he could do with the access he gained with about any John Deere machine:

Ruined Fields

Spraying. Think you’re spraying where you mean to? Not if your system was targeted. Your compromised system can now compromise your land. Sometimes even spraying 10x the amount of chemical in a given area that you meant. Maybe 10x less. The point is your land is now a target for biological warfare on domestic soil.

Machines Turned into Bricks or Weapons

You can’t access any part of your machine. Maybe the hacker renders it as useful as a cinder block in the middle of a field. Maybe they demand a ransom and there is little law enforcement or John Deere can really do for you there. Maybe the hacker overtakes control which the vehicle, compromises the GPS and decides to visit the neighbor’s field or the highway. Or the river.

Commodity and Carbon Credit Market Manipulation

Maybe all the hacker wants is your planting and harvest data. If you are one of many farmers I know who somewhat purposefully fudges your USDA crop census survey, do appreciate that the combine won’t lie for you.

From an individual farm, these data are somewhat impressive depending on what the hacker does. If you depend on machinery data to cash in on the emerging Carbon Credits industry, you can easily lose all of it.

But if, like Sick Codes, a hacker gained access to the whole network of John Deere machinery? That is enough real time planting and harvest data to manipulate the commodities markets. From there, the possibilities can go many places.

Other Folks Involved

Sick Codes did not tackle this issue and understand the implications without actual subject matter experts. He partnered with two Midwest folks, Willie Cade and Kevin Kenney. Both are active in right-to-repair dialogue going on right now.

Defcon Logo

John Deere Logo

Now What?

I feel like this covers the What, the So What, but there’s not much of a Now What. That is unless John Deere is working behind the scenes with their own security folks trying to patch this stuff up RIGHT NOW. Honestly, this whole thing is egregious. 

I’ve never been happier for folks still using their “dumb” machines to work fields. At this point, those folks are sincerely more secure in being able to bring in their harvests this or any season than their high tech neighbors. I wish I could feel better about that.

John Deere 7450 ProDrive Forage Harvester

Bottom Line

Here’s the frank truth. If your tractor connects to the internet in ANY WAY, you are connected to the Internet. If you are connected to the Internet, digital security is your problem as much as physical security.

Blunt Final Thoughts on This Issue


John Deere’s response to these issues indicate to me that more hackers need to take an interest in agriculture. Machine equipment alone presents unique IoT challenges with high impact. Agriculture I know looks simple from the outside, but it is a highly complex, global system. Humanity’s entire existence depends on it to work and work well. 

Hackers looking for high impact and interesting problems, here’s your sign.

Thinking Up React Native App Ideas to Inspire Expanding My Skills

Recently, I’ve started learning React Native to expand my use and practice Javascript. There’s no smooth connection between my data science work and mobile web development. I do like building stuff though. These are 10 ideas I’ve mulled over that I think would be interesting to build:

01

A Gamer Friend Matching Platform

 Would use a Tinder-like matching process to quickly browse and mutually approach people for anything from your next raid to turnip buying

02

End of Life Organizer

For prepping online accounts, financial info, etc to be shared with specific people in your life upon the end of life

03

Emergency Contact Alert System

Sends an immediate alert with your GPS coordinates, time, and contact info to designated contacts when hit. It will also send an alert to local authorities with your contact information so they can follow-up on you. Comparatively a Life Alert for  women

04

Home Garden Planner

Data-Centric approach to garden planning, which matches ideal produce to the soil and climate on record for your designated plot of land using open source data on local soil and plat data

05

On-Demand Commercial Food Transportation Platform

Platform with on-demand jobs filling in smaller orders that can’t be taken care of by regular food transportation companies

06

Story Pitching Platform

To search for authors where their first three chapters or top of their short story is previewed.

07

Business Card Exchange Hub and Organizer

Gives you a canonical link to all your contact info and allows you to do the same for others. Exchange info via link, on phone tap, using a QR code, or even scanning a physical business card. Even create a physical business card template to send to an online business card business like MOO

08

Online Course Platform for 6-12 Grade Electives

Aimed at high school and jr high kids on learning subjects that are rarely or not well-covered in most classrooms like sex ed, Internet privacy, finance basics, and media literacy

09

Sewing and Needlework Pattern Sharing Platform

 Similar to Ravelry focused on sewing and embroidry

10

Internet Data-Visualization Bullet Journal App

Focuses on acting as an api-calling read-in app displaying your various goals

 

What is CRISP-DM?

At some point working in data science, it is common to come across CRISP-DM. I like to irreverently call it the Crispy Process. It is an old concept for data science that’s been around since the mid-1990s. This post is meant as a practical guide to CRISP-DM.

CRISP-DM stands for CRoss Industry Standard Process for Data Mining. The process model spans six phases meant to fully describe the data science life cycle.

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation
  6. Deployment

CRISP-DM Process Diagram

This cycle comes off as an abstract process with little meaning if it cannot be grounded into some sort of practical example. That’s what this post is meant to be. The following is going to take a casual scenario appropriate for many Midwestern gardeners about this time of year.

What am I planting in my backyard this year?

It is a vague, urgent question reminiscent of many data science client problems.  In order to do that we’re going to use the Crispy Process in this practical guide to CRISP-DM.

First, we need a “Business understanding”. What does the business (or in this case, gardener) need to know?

Next, we have to form “Data understanding”. So, what data is going to cover our first needs? What format do we need that data in?

With our data found, we need to do “Data preparation”. The data has to be organized and formatted so it can actually be used for whatever analysis we’re going to use for it.

The fourth phase is the sexy bit of data science, “Modeling”. There’s information in those hills! Er…data. But we need to apply algorithms to extract that information. I personally find the conventional title of this fifth phase to be somewhat confusing in contemporary data science. Colloquial conversations I would have among fellow data professionals wouldn’t use “Modeling” but rather “Algorithm Design”  for this part.

“Evaluation” time. We have information As one of my former team leads would ask at this stage, “You have the what. So what?” 

Now that you have something, it needs to be shared with the “Deployment” stage. Don’t ignore the importance of this stage!

I pick up on who is a new professional and who is a veteran by how they feel about this part of a project. Newbies have put so much energy into Modeling and Evaluation, Deployment is like an afterthought. Stop! It’s a trap! 

For the rest of us “Deployment” might as well be “What we’re actually being paid for”. I cannot stress enough that all the hours, sweat, and frustration of the previous phases will be for nothing if you do not get this part right.

Business understanding: What does the business need to know?

We have a basic question from our gardener.

What am I planting in my backyard this year?

To get a full understanding of what they need though in order to take action as a gardener to plant their backyard this year, we need to break this question down into more specific concrete questions.

If I ever can, I want to learn as much about the context of the client. This does necessarily mean I want them to answer “What data do you want?” It is also important to keep a client steered away from preconceived notions of the end result of the project. Hypothesizes can dangerously turn into premature predictions and disappointment when reality does not match those expectations.

Rather, it is important to appreciate what kind of piece you’re creating for the greater puzzle your client is putting together.

About the client

I am a Midwestern gardener myself so I’m going to be my own customer.

Gardening hobbyist who wants to understand the plants best suited for a given environment. Ideal environment to include would be the American Midwest, the client’s location. Their favorite color is red and they like the idea of bits of showy red in their backyard. Anything that is low maintenance is a plus.

For this client, they would prefer the ability to keep the data simple and shareable to other hobbyists. Whatever data, we get should be verified for what it does and does not have as the client is skeptical of a dataset’s true objectivity.  

Data understanding: What data is going to cover our needs?

One trick I use to try to objectively break down complex scenarios in real life is to sift the business problem for distinct entities and use those to push my data requirements.

We can infer from the scenario that the minimal amount of entities are the gardener and the plant. As the gardener is presumably a hobbyist and probably doesn’t have something like a greenhouse at their disposal, we can also presume that their backyard is another major entity, which is made of dirt and is a location. It is outside, so other entities at play include the weather. That is also dependent on location. Additionally, the client cares about a plant’s hardiness and color.

So we know we have the following issues at least to address:

  1. Gardener
  2. Plant
  3. Location
  4. Dirt
  5. Weather
  6. Plant Hardiness
  7. Plant Color

The Gardener is our client and is seeking to gain knowledge about what is outside their person. So we can discard them as an essential data point. 

The plant can be anything. It is also the core of our client question. We should find data that is plant-centric for sure.

Location is essential because that can dictate the other entities I’m considering like Dirt and Weather. Our data should help us figure out these kinds of environmental factors.

Additionally, we need to find data that could help us find the color and hardiness of a plant.

There are many datasets for plants, especially for the US and UK. Our client is American so American-focused datasets will narrow our search. 

The USDA has a robust plant database as seen in the screenshot below. It includes locations where the plants are best suited, which is incredibly useful. We can get information on plant uses, its history, and management.   usda plant finder page

It has several issues though with our needs. One of the most glaring ones is location. While it does have state information, single states in the United States can be larger than one country in many parts of the world. They can cover large geography types so our concern about issues like weather are not accounted for in this dataset.

Perhaps ironically, the USDA does have a measuring system for handling geographically-based plant growing environments, the USDA Plant Hardiness Zones.

USDA Plant Hardiness Zones are so prevalent, they are what Americans gardeners typically used to shop for plants. Given that our client is an American gardener, it is going to be important to grab that information. Below is an example of an American plant store describing the hardiness zone for the plant listed.

burpee seeds plant listing

American institutions dedicated to plants and agriculture are not limited to just the federal government. In the Midwest, the Missouri Botanical Garden has its own plant database, which shows great promise.

mobot plant finder page

Not only does it have hardiness zones, it also has data to evaluate water needs, and a color-based bloom description. Additional searching through the data also allows us to consider the dirt quality it tolerates as well as other extreme scenarios like drought.mobot plant finder filter options

The way the current data is set up, we could send this on to our client, but we have no way of helping them verify exactly what this dataset does and does not have. We only know what it could have (drought-resistant, flowering, etc), but not how many entries.

We’re going to have to extract this data out of MOBOT’s website and into a format we can explore in something like a Jupyter notebook.

Data preparation: How does the data need to be formatted?

Getting the Data

The clearest first step is that we need to get that data out of just MOBOT’s website. 

Using Python, this is a straightforward process using the popular library, Beautiful Soup. The following presumes you are using some version of Python 3.x.

The first function we want to approach this is a systematic way of crawling all the individual webpages with plant entries. Luckily, for every letter in the Latin alphabet,  MOBOT has web pages that use the following URL pattern: 

https://www.missouribotanicalgarden.org/PlantFinder/PlantFinderListResults.aspx?letter=<LETTER>

So for every letter in the Latin alphabet, we can loop through all the links in all the webpages we need.

The following is how I tackled this need. To just go straight to the code, follow this link.

 

def find_mobot_links():

 

    alphabet_list = [“A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”, “L”, “M”, “N”, “O”, “P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”]

 

    for letter in alphabet_list:

       file_name = “link_list_” + letter + “.csv”

       g = open(“mobot_entries/” + file_name, ‘w’)

        

       url = “https://www.missouribotanicalgarden.org/PlantFinder/PlantFinderListResults.aspx?letter=” + letter

 

       page = requests.get(url)

       soup = BeautifulSoup(page.content, ‘html.parser’)

 

       for link in soup.findAll(‘a’, id=lambda x: x and x.startswith(“MainContentPlaceHolder_SearchResultsList_TaxonName_”)):

           g.write(link.get(‘href’) + “\n”)

       g.close()

 

Now that we have the links we know we need, let’s visit them and extract data from them. Web page scraping is a process of trial and error. Web pages are diverse and often change. The following grabbed the data I needed and wanted from MOBOT but things can always change. 

 

def scrape_and_save_mobot_links():

 

    alphabet_list = [“A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”, “L”, “M”, “N”, “O”, “P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”]

 

    for letter in alphabet_list:

        file_name = “link_list_” + letter + “.csv”

 

        with open(“./mobot_entries/” + file_name, ‘r’) as f:

            for link_path in f:

                url = “https://www.missouribotanicalgarden.org” + link_path

 

                html_page = requests.get(url)

                http_encoding = html_page.encoding if ‘charset’ in html_page.headers.get(‘content-type’, ”).lower() else None

                html_encoding = EncodingDetector.find_declared_encoding(html_page.content, is_html=True)

                encoding = html_encoding or http_encoding

                soup = BeautifulSoup(html_page.content, from_encoding=encoding)

 

                file_name = str(soup.title.string).replace(”  – Plant Finder”, “”)

                file_name = re.sub(r’\W+’, ”, file_name)

   

   

                g = open(“mobot_entries/scraped_results/” + file_name + “.txt”, ‘w’)

                g.write(str(soup.title.string).replace(”  – Plant Finder”, “”) + “\n”)

                g.write(str(soup.find(“div”, {“class”: “row”})))

                g.close()

   

                print(“finished ” + file_name)

   

            f.close()

            time.sleep( 5 )

 

Side note: A small, basic courtesy is to avoid overloading websites serving the common good like MOBOT with a barrage of activity. That is why the timer is used in between every loop.

Transforming the Data

With the data out and in our hands, we still need to bring it together in one convenient file, we can examine all at once using another Python library like pandas. The method is relatively straightforward and also already on Github if you would like to just jump in here .

Because our previous step got us almost everything we could possibly get from MOBOT’s Plant Finder, we can pick and choose just what columns we really want to deal with in a simple, flat csv file. You may notice the code allows for the near constant instances where a data column we want to fill in doesn’t have a value from a given plant. We just have to work with what we have.

Ultimately, the code pulls Attracts, Bloom Description, Bloom Time, Common Name, Culture, Family, Flower, Formal Name, Fruit, Garden Uses, Height, Invasive, Leaf, Maintenance, Native Range, Noteworthy Characteristics, Other, Problems, Spread, Suggested Use, Sun, Tolerate, Type, Water, and Zone.

 

That should get us somewhere!

 

Modeling: How are we extracting information out of the data?

I am afraid there isn’t going to be anything fancy happening here. I do not like doing anything complicated when it can be straightforward. In this case, we can be very straightforward. For the entirity of my data anlsysi process, I encourage you to go over to my Jupyter Notebook here for more: https://github.com/prairie-cybrarian/mobot_plant_finder/blob/master/learn_da_mobot.ipynb

The most important part is the results of our extracted information:

  • Chinese Lilac (Syringa chinensis Red Rothomagensis)
  • Common Lilac (Syringa vulgaris Charles Joly)
  • Peony (Paeonia Zhu Sha Pan CINNABAR RED)
  • Butterfly Bush (Buddleja davidii Monum PETITE PLUM)
  • Butterfly Bush (Buddleja davidii PIIBDII FIRST EDITIONS FUNKY …)
  • Blanket Flower (Gaillardia Tizzy)
  • Coneflower (Echinacea Emily Saul BIG SKY AFTER MIDNIGHT)
  • Miscellaneous Tulip (Tulipa Little Beauty)
  • Coneflower (Echinacea Meteor Red)
  • Blanket Flower (Gaillardia Frenzy)
  • Lily (Lilium Barbaresco)

Additionally, we have a simple csv we can hand over to the client. I will admit as far as clients go, I am easy. Almost like I can read my mind.

Evaluation: You have the what. So what?

In some cases, this step is simply done. We have answered the client’s question. We have addressed the client’s needs. 

Yet, we can still probably do a little more. In the hands of a solid sales team, this is the time for the upsell. Otherwise, we are in scope-creep territory. 

Since I have a good relationship with my client (me), I’m going to at least suggest the following next steps. 

Things you can now do with these new answers:

  • Cross reference soil preferences of our listed flowers with the actual location of the garden using the USDA Soil Survey’ data (https://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm).
  • Identify potential consumer needs of the client in order to find and suggest seed or plant sources for them to purchase the listed flowers.

Deployment: Make your findings known

Person experience has shown me that deployment is largely an exercise in client empathy. Final delivery can look like so many things. Maybe it is a giant blog post. Maybe it is a PDF or a PowerPoint. So long as you deliver in a format that works for your user, it does not matter. All that matters is that it works.