Detect EC2 instances without SSM enabled

So you have a fleet of EC2 instances running, and you need to patch them with SSM. You deploy the patches, but somehow, you missed some instances. It turns out that the SSM agent is not running on all your EC2 instances. This could be a disaster.

There is no direct way to find which EC2 instance is missing an agent. I created a little Python script that I wrapped in a Lambda function that will run on any cron schedule you define. If it detects any missing SSM agents, it will send you an alert via your Slack channel. The entire solution is is wrapped in a CloudFormation template that you can easily deploy.

  • To use the CloudFormation template, you will need to have a Slack Webhook already configured. If you haven’t done that yet, do that first, and record the webhook URL somewhere, we’ll use it in a sec.
  • Download the CloudFormation template
  • Log onto the AWS Console, change to the region where you’d like the function to run.
    • Yes, if you want to run the monitor on multiple regions, you will need to deploy the Lambda functions to those regions.
  • Create a new stack in the Cloudformation console. Use the file you downloaded above as the template.

You will be presented with the parameters screen.

  • Stack Name – Give a descriptive name – this is entirely up to you
  • SlackWebHook – Provide the Slack Webhook – if you haven’t set this up yet, go ahead and do that first (of course, this all assumes that you’re actually using Slack!)
  • additional – This is some additional text that will be added to the Slack message. In case you run the function on multiple accounts and multiple regions, you may want to specify where this is coming from.
  • cron – By default, the Lambda function will trigger daily at 12:00. You can modify this to suit your own requirements.

Click Next, click Next again. Right at the bottom, make sure to select the checkbox against the I acknowledge that AWS CloudFormation might create IAM resources option. The template will need this permission, as it will be creating a role for the Lambda function to retrieve the EC2 and SSM data. Click on “Create Stack” to create the stack.

And that’s it! Provided the Slack webhook is setup correctly, you will receive a Slack alert like this one.

And that’s it! I hope you find this helpful. If you have any feedback or suggestions, do let me know either by opening an issue in Github, or providing feedback in the post below.

AWS Security Info – August 2021 update

It’s been a bit of a quiet month for updates to the AWS Security Info modules. There’s been a couple of changes that I’m publishing today.

New features

  • We now support Organizations! That’s right.. If you point the script to the master account, and you specify the --organization parameter with the name of your organizational role, the script will interrogate every account in your organization.
  • The --regions flag will allow you to specify the regions you operate it, thus reducing the total number of API calls being made.
  • Managed AWS policies’ get_policy_version data is now added to the initial.json file, and fed into the data load on first load. This is speeding up the data collection process significantly by reducing the amount of API calls for managed AWS policies that do not change very often. Should AWS make changes to their policies, simply delete the initial.json file, and let the script run through it once.

Bug fixes

  • Using --assumerole works fine, however when you have an empty --externalid, the sts module fails.
  • checkVersion incorrectly flagged newer boto versions as not being upgraded.

Data collection

  • AWS SSO has been added. Note that some aspects are still missing (like identitystore, and the visibility of MFA settings in SSO, and that for users)

Policy updates

AWS Security Configuration Scanner

Large enterprises tend to invest into CSPM systems (Cloud Security Posture Management) like Dome9, PrismaCloud, or Orca Security. For smaller companies, it may be cost prohibitive to invest in a CSPM, so they tend to simply do nothing , and hope they don’t have any breaches. This is a dangerous place to be in.

Let’s assume you do look at tools like Trusted Advisor once in a while. It will show you where some of the big ticket items are that you need to look at, but it doesn’t go into a lot of detail. That’s where the AWS Security Info Configuration Scanner comes in. The AWS Security scanner is a Python project I’ve been working on for the past 2 years, and it is finally ready for release.

As the title implies, it’s a tool you can use to scan the configuration of your AWS account. It has a number of built-in security controls that will give you an overview of where the security issues in your AWS account could be. At a high level, the AWS CIS Foundations Benchmark was used as the basis for the majority of security controls.

I can already hear some of you saying : Why should I use this script, when I can simply use Security Hub? And you’d be right — you could use Security Hub (and in fact, I highly recommend it!). The big difference is that with Security Hub, you’ll have Config rules setup, and Config will incur additional charges. This is not necessarily a bad thing. The problem however, is that Security Hub will keep generating alerts, and unless you’re actively monitoring them, the alerts will simply go into a blackhole, never to be seen again.

Why use this script then? I view it more like an audit tool. It has the ability to generate a point-in-time snapshot of what the security configuration of your AWS looks like, and the output can then be used by auditors to discuss and challenge the findings with the various cloud security architecture teams.

How to use it

I would recommend that you run the script from the us-east-1 region. Since this is the central region for AWS (where all the IAM function live), most of the API calls will occur against this region, so it’s recommended that you either use the Shell, or a Spot instance in that region to run the script.

Shell

Fire up the CloudShell in the us-east-1 region. Assuming you have ReadOnly access to the AWS account, simply execute the following lines of code

git clone http://github.com/massyn/aws-security
pip3 install boto3 --upgrade
python3 aws-security/scanner/scanner.py --json output.json --html output.html

That’s it! The script should start running. Depending on the size of your environment, it may take about 30 minutes to run, maybe more. When it’s done, you can send the output.json and output.html files to an S3 bucket.

Spot instance

This is a work in progress. I have been successful in running a spot instance to execute the script. I am busy packaging the solution, and will update this blog post once it is ready. Essentially, you need to :

  • Create an EC2 IAM Instance that has read-only access to the entire AWS account, and write access to a specific S3 bucket.
  • Spin up an EC2 Spot instance with a public IP address, attach the EC2 IAM instance to the spot instance, and run the same commands as mentioned above.
  • When the script is done, copy the generated files to an S3 bucket, and destroy the EC2 instance.

Operation

The script connects to AWS using the default credentials, and starts to interrogate each of the services to retrieve the data. This is where the json files comes in. When it’s done with the data extraction, you’ll have a single json file that contains (most) of the system configuration that has been defined on your AWS account. This has huge implications. If you’re interested in digging through the config, you will be able to generate your own jmespath queries to retrieve anything your heart desires.

Once the json file has been created, the policy parser kicks in. It will read through the json file, looking for the logic that has been predefined in the script, and then generating a report (in HTML format) of all the findings.

Hidden features

When specifying the output file names (--json, --html), you can specify %a (for the accountId) or %d for the date. This allows you to have a batch file or a shell script you can run against a number of accounts, and it will keep a file per account, per date.

You can also request the cloud team to run the json extract for you. Once you have the json file, you can parse the output yourself, using the --nocollect function. This will simply skip the data ingestion function, and read the provided –json file, and parse the security rules.

Did you know you can specify an S3 path for the html or json files? That’s right! You can store the HTML file directly to S3!

It supports Slack! That’s right. You can specify the --slack option with your webhook ID to get the report straight in your Slack feed.

Known issues

  • The script does support the ability to run AssumeRole and connect to another account that have permissions. The issue however is that the provided credentials are only valid for 1 hour. If your data collection will take more than an hour to run, the script will start failing. As a workaround, the json file is constantly updated, so simply restarting the process will allow the script to continue where it left off, and complete.
  • Not all use-cases could be tested. There is a chance that some data collection or policy parsing would fail, simply because I wasn’t able to test it. For example, I do not have access to Direct Connect on my lab system (and I’m really not going to request a dedicated leased line just for that), so there is a possibility that you may have some failures as a result of that. If so, simply open a case on the Github Issue log, and let me know about the issue, so I can resolve it.
  • The script takes too long to run as a Lambda function.
  • Data ingestions for cloudfront – list_functions may fail. If so, update your boto3 python library.

What’s next?

This is where you come in. The main driver for this project is to give something back to the AWS community, to make AWS a more secure environment for its customers. Some of the things I’d like to still do are:

  • Fix the Lambda function. This will require decoupling the script, and let multiple Lambda functions run to collect the data from multiple regions, and possibly storing the data in Dynamodb.
  • Add more policies. Do you have some ideas? Log them in the Github Issue log.
  • Add multi-threading. When connecting to individual regions, do it in a multi-threaded manner so that we can speed up the execution of the data collector.
  • Build a web Frontend. I am playing with the idea of turning the script into a full-blown CSPM solution.
  • Improve the policy engine. I’ve started to convert the policies into jmespath queries. This is going to take a while. Any new policies (if feasible) will be added into a new policy configuration file.

Community Support

The project is hosted in GitHub, and being in GitHub means that you can fork your own copy of the code, and adjust it. All I ask is that you give credit, and that you contribute to the overall project with source code suggestions, or new policies you’d like to see.

https://github.com/massyn/aws-security

Bayesian Average

The Bayesian Average is a mathematical formula that is used to derive average in a data set when the data set may be small. Typically you’ll see the bayesian average used on sites like Yelp.

Let’s assume for a moment, there are a number of restaurants, with various ratings across the board. Each of them is shown a rating from 1 to 5 stars. A new restaurant enters the site, and they have received a single, 5 star rating. If you were to consider all ratings only as an average, this new restaurant with their single rating, will now be considered the highest rated restaurant in the entire city, which may not be entirely accurate.

This is where the Bayesian Average comes in. In essence, the number of ratings (or votes) has an influence on the total outcome. It may be better to illustrate this with an example. Here is our data set. Every line contains the result of an individual vote. Right at the end, you’ll see Luigi’s, which had a single rating of 5.

VoteRating
Mario4
Brando4
Rocco3
Franco2
Mario3
Brando2
Rocco5
Franco5
Mario2
Brando2
Rocco3
Mario3
Brando3
Luigi5

Step 1 – We need to tally up the totals. Average the rating for each restaurant (this_rating), and count how many ratings (this_num_votes) each restaurant received.

Votethis_ratingthis_num_votes
Mario3.004
Brando2.754
Rocco3.673
Franco3.502
Luigi5.001

Step 2 – Calculate the average rating (avg_rating) and the average number of votes (avg_num_votes) by averaging the totals received from step 1. In this example avg_rating = 3.58 and avg_num_votes = 2.80

Step 3 – With all of this information, we can now calculate the bayesian average for each restaurant. The formula for the bayesian average is :

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

This leaves us with the following result. Even though Luigi’s only has one vote, the Bayesian Average comes in at 3.96. While it is still higher than all of his competitors, it is more realistic considering the total number of votes that have been received.

Votethis_ratingthis_num_votesBayesian
Mario3.0043.24
Brando2.7543.09
Rocco3.6733.63
Franco3.5023.55
Luigi5.0013.96

Python example

This example will demonstrate how you can calculate the Bayesian Average using Python.

import json
# Bayesian Average example in Python

# Step 0 - Feed the data set with the data we are interested in
data_set = [
	{'vote' : 'Mario'	, 'rating' : 4},
	{'vote' : 'Brando'	, 'rating' : 4},
	{'vote' : 'Rocco'	, 'rating' : 3},
	{'vote' : 'Franco'	, 'rating' : 2},
	{'vote' : 'Mario'	, 'rating' : 3},
	{'vote' : 'Brando'	, 'rating' : 2},
	{'vote' : 'Rocco'	, 'rating' : 5},
	{'vote' : 'Franco'	, 'rating' : 5},
	{'vote' : 'Mario'	, 'rating' : 2},
	{'vote' : 'Brando'	, 'rating' : 2},
	{'vote' : 'Rocco'	, 'rating' : 3},
	{'vote' : 'Mario'	, 'rating' : 3},
	{'vote' : 'Brando'	, 'rating' : 3},
	{'vote' : 'Luigi'	, 'rating' : 5}
]

# Step 1 - Tally up the totals
totals = {}
for d in data_set:
    # -- setup the dictionary
    if not d['vote'] in totals:
        totals[d['vote']] = { '_total' : 0, 'this_num_votes' : 0, 'this_rating' : 0.0 , 'bayesian_average' : 0 }

    # -- start counting the individual results 
    totals[d['vote']]['this_num_votes'] += 1
    totals[d['vote']]['_total'] += d['rating']
    totals[d['vote']]['this_rating'] = totals[d['vote']]['_total'] / totals[d['vote']]['this_num_votes']

# Step 2 - Calculate the averages
count = 0
avg_rating_total = 0
avg_rating = 0
avg_num_votes_total = 0
avg_num_votes = 0

for d in totals:
    count += 1
    
    # == calculate avg_rating
    avg_rating_total += totals[d]['this_rating']
    avg_rating = avg_rating_total / count
    
    # == calculate avg_num_votes
    avg_num_votes_total += totals[d]['this_num_votes']
    avg_num_votes = avg_num_votes_total / count  

# Step 3 - Calculate the Bayesian Average
for d in totals:
    totals[d]['bayesian_average'] = ( (avg_num_votes * avg_rating) + (totals[d]['this_num_votes'] * totals[d]['this_rating']) ) / (avg_num_votes + totals[d]['this_num_votes'])
    print('{vote} = {br}'.format(vote = d, br = totals[d]['bayesian_average']))

# Step 4 - Show the data we have collected, including the Bayesian Averages
print(json.dumps(totals,indent=4))

Things I don’t like about AWS

Full disclaimer – I am an Amazon Web Services fan boy. I love their cloud offering and I proudly hold 3 AWS certifications. Through my day job, I am also getting exposed to Azure. Yes I know – Azure is a swear word amongst Amazonians, but the reality is that many companies do dabble in multi-cloud strategies. Some cloud providers are better at some things than others, some features are just nicer than others, so with that, I decided to start putting a list together of some of the cool (and not so cool) features I have spotted on both platforms.

Having said that – because I love the AWS service, I also feel it is my duty to point out where I think they need to improve their service. Even though Gardner puts them as a leader in the cloud space, there still are some things I think they can improve.

This blog post will be updated from time to time, so do come back to see the updated list. Do you have some items you’d like to add? Post them in the comments.

See all resources in one screen
When you log onto Azure, you are able to see different resources from every region, all on one page. This is great when you are playing around in the cloud platform, you can simply go and delete it all when you’re done.

In AWS? No.. You have to switch to the region, and then switch to the specific service to see what is in there, so, if you’re playing around and learning new services, do remember to go clean it up afterwards, or you may end up with bill shock!

Generate Infrastructure-as-Code (IaS) templates
Here Azure is also leading. You can build your environment, and right from the console you can generate an ARM template. This is a great way to develop, package, and then deploy a consistent infrastructure to your production platform.

Sadly AWS does not offer this. CloudFormation is good for deploying resources, but there is no tool to analyze a cloud account, and generate CloudFormation templates from it. This is unfortunate, as Azure makes it very easy with per-generated templates to help developers adopt the IaC mindset.

Administer a database from the portal
When you’re using a PaaS-style database, be it Arora or RDS, sometimes you need to poke a few SQL commands against the database. Azure offers a SQL Query Analyzer-style interface where you can log onto your SQL database, straight from the Azure portal. AWS however does not have this. It is always a hassle to spin up a separate EC2 instance, configure security groups, install a web server and install phpmyadmin. Surely something as common as administering a SQL server can be a basic service offered by AWS.

Websites from storage linked to a domain name
Hosting websites from S3 is a great feature. You can load all the HTML, JavaScript, CSS, and anything else your website may require into an S3 bucket, and then turn that S3 bucket into a website. Now you get into a situation where you want to attach your own domain name to S3 website hosting, you’ll find the only way to achieve this, is to attach the S3 bucket to CloudFront, AWS’ CDN solution. CloudFront allows you to attach your own domain name to it, so while it would’ve been nice if S3 supported custom domains without the use of CloudFront, you, the customer, will have to cough up additional cash to Amazon copy your content over to CloudFront, and serve it all over the world.

I will add, CloudFront is a great service, if you need to serve content all over the world, and you don’t mind spending a few extra dollars for your hosting. For smaller businesses, they tend to operate within a geographical region, and then using CloudFront for an S3 bucket may not always make sense. There are ways you can achieve caching of content through http headers without the need of using a CDN.

Update on 2021.06.20 – If you create the S3 bucket name with the exact same name of the domain you’d like to host, then you’re able to use S3 hosting by redirecting your domain name with a CNAME to the S3 web URL. See this link for more details.

Using AWS Lambda + API Gateway from Javascript

If you’ve been working with AWS for a while, you’ll know that Lambda functions is where it’s at.  Lambda is AWS’ server-less offering, allowing you to run code in the cloud without having to worry about infrastructure.

Lambda is really powerful.  The real advantage is with the complete integration in AWS IAM.  Once you’re familiar with how policies work, the ability to lock the process down to the least privileges required becomes a breeze.

Let’s run through an example.  In this use case, we’ll create a simple HTML page, that will generate a random message that is retrieved via API Gateway from a Lambda function written in Python.

Create a Lambda function

  • If you haven’t done so yet, log onto the AWS Console.
  • You may want to switch to your favourite region.
  • Open the Lamba interface (you can just search for Lambda)
  • Click on the button “Create Function”
  • Select the following options
    • We will Author from Scratch
    • Function name : simpleMessage
    • Runtime : Python 3.8 (if there’s a later version, that may likely work as well)
    • For the purpose of this example, we’ll let Lambda create the role for us.
  • Click Create Function

The core function will be created, with the associated role.  There’s still no code in the function, so it’s pretty useless at the moment.  Scroll down in the console, and you should see the block where the source code can be edited.  Open the python script in a separate tab, and copy the source code into the lambda function, overwriting everything that is in there.  When you’re done, hit the Save button.

Test the function

Let’s test it to make sure the Lambda actually works correctly.  Click the Test button.  If you haven’t created a test event yet, do so now.  Give the test event a name, and click create.  Once it has been created, you can select the test event in the dropdown, and click “Test”

Did you see the message : “Execution result: succeeded” ? If so, Congratulations!  The code executed fine.  You can open the details to see the result, and it should look similar to this :

{
	"statusCode": 200,
	"body": "{\"message\": \"A sloth explodes your homie\"}",
	"headers": {
		"Access-Control-Allow-Origin": "*"
	}
}

Create the API

So on it’s own, this Lambda function will just sit there until you explicitly invoke it.  We will now create the API Gateway.  The API Gateway is an interface front-end, that allows Lambda functions to be called from external sources.

Scroll down on the Lambda function, and you should see a screen where you can click “Add Trigger”.  Click on Add Trigger, and select API Gateway.

  • Select Create an API
  • For API Type, select REST API
  • For Security, select Open
    • Do note that Open means that ANYONE can call the API.  It is not advised to operate in this mode on a production system.
  • Click Add

The API will now be created.  When it’s done, you should see the endpoint URL on the same screen.  It should look similar to this.

https://gd19ykrcka.execute-api.ap-southeast-2.amazonaws.com/default/simpleMessage

You should be able to click on your link.  If everything worked correctly, you should be able to see a message in your browser.

Include the API on a web page

Now that the API has been created, you can invoke it from a web page.  Use the index.html page from the repo.  Update the api variable and replace the contents with the URL from your REST API.

Save the file to your drive, and open it through your browser.  If it all works ok, you should see a message being called through JavaScript, invoking the API Gateway, and executing your Lambda function.

What’s next?

The code works.  It does the job, although it is quite basic.  The CORS features are loosely disabled in the Lambda function (you will notice the CORS headers are sent through the header function within Lambda).  You may decide not to do it this way, and control CORS through the API Gateway instead, with stricter control.

In an upcoming blog post, I will show you how you can create a user registration page with Cognito, and control access to API Gateway for authenticated users.

Don’t forget to delete the API Gateway, and the Lambda function when you’re done, to avoid unnecessary charges on your account.

Let me know if you succeeded with the example.  I’d love to hear what you’ve been able to achieve with server-less in AWS.