Ed Elliott's blog

tSQLt-Course

Learn how to use tSQLt to professionally unit test T-SQL code. To enroll in the free self-paced course visit here

gdpr - panic part 4

  • Posted on: 8 February 2018
  • By: Ed Elliott

In the first part of this series, we looked at where to find out more information about GDPR in the UK (hint: The ICO.gov website has everything you need). In the second part, we looked at some historical enforcement action by the ICO against companies who had made a mistake with their security and data that they were responsible for. In the third part, we again looked at some different angles of opsec which have caused issues such as physical access to a server room not monitored by CCTV and how it led the RSA to a £150,000 fine. Finally, in part 3 we looked at Keurboom Communications Ltd who deliberately used the data they had to make almost 100 million phone calls to sell PPI or accident handling – I am sure anyone who has a UK number would have received some of these, I know I certainly did and ended up getting so many at one point I changed my phone number.

In this part, we will change to look at how companies have mishandled the data they have, not regarding poor operational security but regarding how they used the data in ways they were not authorised to do. This is just as important as the previous areas of data handling we looked at, and as we have seen with Keurboom and their £400,000 fine, the ICO doesn’t treat this as any less important than losing data because of poor security.

Moneysupermarket.com Ltd (MSC LTD)

https://ico.org.uk/media/action-weve-taken/mpns/2014482/mpn-moneysuperma...

One individual had told the Moneysupermarket.com that they did not want to receive marketing emails.

In December 2016, Moneysupermarket.com sent an email to someone saying that they had updated their terms and conditions and highlighted that they had refreshed their privacy policy. The email included this section (copied from the enforcement action):

“We hold an e-mail address for you which means we could be sending your personalised news, products and promotions. You’ve told us in the past you prefer not to receive these. If you’d like to reconsider, simply click the following link to start receiving our e-mails”. This was followed by a large ‘click link’ box entitled ‘Go To Preferences’”

This one individual complained to the ICO because they said

you can’t use their data to ask them if you can use their data.

because the email asking for consent to future marketing messages was itself sent for direct marketing.

Moneysupermarket.com explained that they had sent the email 7,127,415 times (only 6,788,496 were received). Sidenote: In all these cases of sending emails the people being fined always include the “we sent x, but only y were ever received”, if I was the ICO I would ignore that as they had attempted to send the larger amount and the intent was that they would be received but I don’t work for the ICO so ho-hum.

Every customer who received that email had previously opted our of receiving direct marketing emails. Moneysupermarket.com could not provide any evidence the recipients of those emails had consented to receive the messages.

The ICO felt that it was a deliberate contravention of regulation 22. The enforcement action again highlights the ICO’s guidance that it gives out for free (I can’t stress this enough, if the ICO have it in their documentation you better know and action it).

Outcome: £80,000 from one complaint.

Honda Motor Europe Limited t/a Honda (U.K.)

https://ico.org.uk/media/action-weve-taken/mpns/2013732/mpn-honda-europe...

Honda has a list of 343,093 users on its list but no opt-in or opt-out information so they sent them all an email asking if they would like to hear from Honda, do you want to opt-in or opt-out. As with the Moneysupermarket.com, a single complaint to the ICO kicked off an investigation. The details of this one really are interesting.

Honda was collecting email addresses from lots of sources including their dealerships who were separate legal entities (important) and Honda had a web site for dealerships to enter in the details of users who consented to marketing but there was a problem with the site, instead of validation on the “Person accepts marketing gumpf” field, some dealers left it blank, some put in an X and some put in emoji poo’s (I made the last one up) so Honda had all these peoples contact details and didn’t know if they could market to them or not.

Honda said it sent the emails, not as a marketing activity but as a service email to ensure they were maintaining compliance with the data protection principles. They stated that they would only keep people who positively replied on their marketing lists and removed everyone else.

Outcome: £13,000.

This was interesting and the thing it really shows that if you make a mistake and can’t provide evidence of consent, then you may as well delete the data you have as you can’t use it.

This case wasn’t helped by Honda continuing to send emails after the ICO had warned them and Honda only stopped when the ICO told them to cease sending the emails.

If you are thinking that you will start contacting people to get their consent before GDPR comes in and you haven’t already got that consent and can provide the evidence to the ICO should a complaint be made then you should expect a fine.

Macmillan Cancer Support

I will end this part by moving away from corporations whose goal is to make money to two charities with the same goal of making money but instead of profit for shareholders, to do some good in the world. Around the end of 2015, beginning of 2016 there was a raft of enforcement action against charities because of media attention at the time.

Because of the nature of charities, the fines are smaller but still significant.

Macmillian had data on and had consent to market to a large number of its supporters, but they did two things. In 2009 and 2014 they used the services of a wealth screening company to find wealthy or high-value individuals amongst Macmillan’s donors. The wealth analysis was not in Macmillan’s privacy policy at the time so individuals could not have consented to it.

The second thing was that they did some telematching which involves trying to match individuals to telephone numbers so they can be called. Macmillan’s privacy policy did not state that it did telematching, so again users could not have consented to it.

Outcome: £14,000

, there are lots of mitigating features here and if this had not been a charity expect this to have been higher. Consent and only using the data in ways which had been consented to are so important.

Battersea Dog's and Cats Home

Finishing off this retrospective of previous ICO enforcements we have the Battersea dogs and cats home, it is timely as I am on the train literally going past right now.

In 2015 charities were in the media about their use of data so the ICO got in touch to see what it was doing with data, this wasn't in response to a specific incident they just decided to get in touch and have a poke about.

Between 2010 and 2015 BDCH (Battersea Dog's and Cat's home) passed 740,181 records to a third party to telematch the users with a phone number. They managed to match 385,709 records and 229,476 were contacted.

The BDCH's privacy notice did not say that they would do any telematching.

Outcome: £9,000 fine.

So there you have it, that is my selection of favorite previous ICO enforcements, I hope they have given a range of views about what it is the ICO may choose to do in terms of fines. The bottom line is to start preparing for GDPR and if you do know about any breech probably notify them now rather than let the term of the issue stretch into the GDPR timeframe where the fines could be a lot heavier.

I hope you have enjoyed this, if you are unsure where to start then re-read the first part and there are plenty of people out there looking to cash in on GDPR so I am sure you will be able to find someone :)

The previous parts are available:

Part one is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-1
Part two is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-2
Part three: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/gdpr-panic-part-3

Tags: 

tSQLt Course

  • Posted on: 6 February 2018
  • By: Ed Elliott

In the last half of 2017, I decided to run a tSQLt course and went down the route of putting together a free 12-week email course. The intention was that every week students would get a bit of a description about how unit testing T-SQL using tSQLt works and a little bit of homework to ease them into writing tests both in terms of how to technically get tSQLt running but also the art of writing effective unit tests.

I originally thought I would offer the course and if ten people signed up then I would go ahead and run it, I was a bit shocked when I closed the initial list at over 100 people. I wrote and delivered the course, and it actually turned out to be a 10-week course, and I changed from email to using an online course hosting app called teachable to deliver the course.

The course is now live and you can join whenever you like from:

https://theagilesqlclub.teachable.com/courses/unit-testing-sql-server-co...

What are people saying about the course?

Tags: 

Re-Imagining an SSIS IDE

  • Posted on: 4 February 2018
  • By: Ed Elliott

Re-Imagining the SSIS development environment.

oooh:

ahhh:


SSIS is a powerful and versatile tool for extracting, transforming and loading data into or out of SQL Server. The development environment, BIDS and now SSDT allow developers to create, edit and debug SSIS package. I struggle with the actual IDE. The thing that I find is that the IDE is designed around a series of dialog boxes, tabs and a graphical map of the package as a whole. SSIS packages let you embed either c# or vb.net scripts and to open one of these scripts a whole separate instance of visual studio is started and I have worked on projects where opening scripts can literally take 5 minutes.

Taking 5 minutes to see what a script looks like and then realizing you need to do something else like look at another script or which variables are being passed into the script you are looking at and having to close down the second copy of visual studio, go find what you want and then go back to open the second copy of visual studio means that using SSIS can be frustrating.

It isn’t just scripts that are a struggle to work with, knowing whether to look an objects properties, the edit dialog or even the advanced editor dialog means struggling to find the information you want. I saw a recent discussion in slack where someone couldn’t work out why a column was being truncated to 1 char. It turned out that in an advanced dialog on one of the outputs the column width was set to 1. A good SSIS dev spent more than a day trying to find that option.

SSIS has a series of event handlers and to see these you need to click on each object then the drop down list of handlers to find out where code might potentially run. When you are under pressure, and you don’t know why something is happening, then event handlers can be easily and frustratingly missed.

The package itself, the .dtsx file, contains all the metadata about things like files that it needs to be able to import data with the correct file types but if you open one of the dialog boxes that requires a file, if the file isn’t there or is inaccessible you get a delay and an error message.

The main part of the UI is a graphical map of the objects in a package. I have spent quite a lot of time trying to think of a better way of showing the elements that make up a package, and it is hard to argue with the workflow approach, all the ETL tools I found do the same thing and visualizing packages as a series of interconnected tasks is useful but maybe not everything and the only way.

Because of all these issues I have really tried to stay away from SSIS development and looked for other solutions such as using biml to write packages or writing code in other languages but I keep coming back to the fact that the SSIS engine is great at what it does and if you have a lot of data or a lot of files/tables then writing a similar framework would be a lot of wasted effort. So, how can we use the SSIS engine but make better, more efficient use of our development time?

I would always advocate using biml to create packages, even simple ones as it is much easier to see what a package is doing and it is much easier to version control the biml XML than the .dtsx file. This gets us so far but doesn’t help in environments where biml hasn’t been used or to examine packages which have been deployed and the biml lost.

What I decided to do was to write a tool that could be used alongside BIDS/SSDT or even BIML with the one overarching goal of having any value, property or script, in fact, anything within a .dtsx file available with a single click, with no more than a 1-second delay at most. With that in mind, I have written K-SSIS-ed which is a tool to view .dtsx files to help make using BIDS/SSDT more efficient and less frustrating.

You can download it using the buttons at the bottom and as it is 2018 obviously it is cross platform! (i've only tested on windows but have also built linux and mac versions.

I hope someone else finds it useful as a package reader, there are no editing capabilities but exploring and checking packages should be less frustrating.
I am using TypeScript to parse the dtsx xml and there are a few versions of the dtsx format I have tried to cater for all but if you have a package that doesn't work, create a sample and open a GitHub issue here https://github.com/GoEddie/K-SSIS-ed-docs/issues.

Because SSIS can be extended with custom components and these can supply their own dialogs to configure the component, loading a .net dll and displaying a dialog goes against my main rule: one click, one second access to the information you need so I fall back to displaying the XML for most tasks and configuring custom displays for things like script tasks and SQL tasks.

This tool is definitely for someone experienced with SSIS and not for a beginner, and you will be exposed to lots of internal data and be expected to know what it means but this is a tradeoff that I would be willing to make.

Support

For support, please raise an issue: https://github.com/GoEddie/K-SSIS-ed-docs

Download

More Download Versions:

https://github.com/GoEddie/K-SSIS-ed-docs/releases

gdpr - panic part 3

  • Posted on: 1 February 2018
  • By: Ed Elliott

In part three of this short series

Part one is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-1
Part two is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-2
This is part three: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/gdpr-panic-part-3

In part two I started looking at previous enforcement action taken by the ICO, Talk Talk (3!!! times) and the historical society:

https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-2

In this part, we will start by looking at the Royal & Sun Alliance Insurance PLC from January 2017:

Royal & Sun Alliance

https://ico.org.uk/media/action-weve-taken/mpns/1625635/mpn-royal-sun-al...

The RSA has a data centre in Horsham, West Sussex - I only mention this as I live nearby.

In the second half of May 2015 someone who had access to the room stole a NAS device.

Whoever took it had a key card to get in, i.e. it was a secure area. However, 40 of RSA's staff and contractors (some of whom were non-essential) were permitted to access the server room unaccompanied.

This for me is the key here, letting people do more than they need to - it is a common theme amongst a lot of the actions.

The Nas device had (among other things):

"personal data sets containing 59,592 customer names, addresses, bank account and sort code numbers and 20,000 customer names, addresses and credit card ‘Primary Account Numbers’. However, it did not hold expiry dates or CVV numbers"

The device was password protected but not encrypted. This is key - if the data was encrypted and stolen not so bad. The ICO decided here that the lack of monitoring of the device and its whereabouts and disappearance was a problem.

The server room didn't have CCTV monitoring, non-essential staff could access it if instead of the 40 people, 5 people had access then it probably wouldn't have happened.

The ICO considered it an ongoing issue as they first installed it in April 2013 and until 2015, before it was stolen, it was un-encrypted and non monitored.

The ICO said they had no evidence the data was ever even accessed, it might have been a cleaner accidentally spilt coffee on it and chucked it, but the fact they don't know what happened to it is a problem. When the RSA notified its customers there were 196 complaints to the ICO. The ICO, therefore, fined RSA £150,000 and it would have been higher if they hadn't voluntarily taken substantial remedial action.

Outcome? £150,000 fine

The next one is the Carphone Warehouse, who were fined a stonking £400,000.

Carphone Warehouse

https://ico.org.uk/media/action-weve-taken/mpns/2172972/carphone-warehou...

Carphone Warehouse had a set of virtual machines which they hosted internal and external websites. Someone used the admin wordpress account on one box to upload some code that gave them access to the box and then the ability to find and download a large amount of data.

In this case, it wasn't that the incident happened, as much as the generally poor security around the incident that led to such a big fine.

- There was no WAF (web application firewall) in-front of the applications.
- The attacker used an off the shelf hacking tool to find vulnerabilities that it could use to "very quickly" gain access and steal data.
- It took them 15 days to discover there was an attacker on their systems
- They don't know what data was stolen so have assumed the worse, all of it
- The attacker found credentials in plain text to the systems it needed
- The wordpress installation was from 2010 and hadn't been patched in six years
- Patch management was not being followed at all in that business area
- The system was accessed using valid credentials but they had no credential management to know who could have used the credentials or who even knew what the credentials were.
- They have no pen testing or vulnerability scanning procedures
- None of the systems had anti-virus on even though there was a policy to implement av
- The servers in this VM farm all had the same root password which was known by 30-40 members of staff. Carphone Warehouse couldn't give a good reason why this wide-ranging access was allowed
- The system contained historic financial data like credit card numbers as an external developer started logging it as a temporary measure and forgot to turn it off again. Carphone Warehouse said it was not aware of this storage and the ICO told them that not knowing about it means that they had an "inadequate understanding of its IT systems, architecture, at least in terms of locations of personal data on those systems." Ouch and "Without an adequate understanding of these issues, security arrangements were likely to be inadequate" and double ouch!
- The historic transactions were encrypted, but the keys were stored in plain text in the application code.

The personal data stored in the applications were:

- 3,348,869 customer records with full contact details, marital status, phone numbers, current and previous addresses
- 389 customers across two other companies
- Historic financial transactions including name address, expiry, and all card numbers (PAN, CID, CVC2, CVV2)
- Approximately 1,000 employees details including contact and car registration numbers

So they suffered from a general malaise in security and had almost 3.5 million customers personal details stolen.

I am going to finish up this part (and now make it a four-part series) by looking at what happens when instead of being a bit poor and being attacked you are on the other side of the fence, up to shenanigans.

Outcome? £400,000 fine

Keurboom Communications Ltd

Keurboom is another company part of the £400,000 club.

https://ico.org.uk/media/action-weve-taken/enforcement-notices/2014013/m...

Keurboom used an automated dialling system to make over ninety-nine and a half million phone calls in one and a half years. They would call the same number repeatedly in the same day and at unsocial hours and often couldn't event connect someone if they wanted to be put through. The calls were typically about PPI and gave the impression of being urgent, upsetting anyone who was in the process of dealing with a PPI claim or road accident.

The ICO was happy that they did not mean to upset anyone but the calls were made without prior consent and they were made deliberately on a massive scale.

Outcome? £400,000 fine

I really feel that post-GDPR Keurboom would have been hit a lot harder, let's hope so!

So, that is enough for this part in the final part I will look at some interesting uses of telematching by charities and a couple of household brands who got a bit exciting with their outlook address book.

Part one is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-1
Part two is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-2
This is part three: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/gdpr-panic-part-3

Tags: 

gdpr - panic part 2

  • Posted on: 1 February 2018
  • By: Ed Elliott

Welcome GDPR friends :)

Part one is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-1
This is part two: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-2
Part three is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/gdpr-panic-part-3

In this part, I am going to have a look at previous action the ICO has taken to give some context to the scary sounding 4% turnover or millions of pounds you will receive if you do something wrong. We obviously can't predict exactly how an enforcement action will go, but we can think about the sort of things we should be doing, before during and after a breach.

Talk Talk

We will start with Talk Talk and a breach they had in March 2016:

https://ico.org.uk/media/action-weve-taken/mpns/1624087/TalkTalk-mpn-201...

A customer found that they could use the password reset facility to access someone else's account and reported it to Talk Talk. Talk Talk fixed it but didn't notify the ICO. The ICO was notified by someone else and before GDPR this lack of notification resulted in a fixed £1000 fee.

Outcome: £1000 fine.

We then skip forward to October 2016:

https://ico.org.uk/media/action-weve-taken/mpns/1625131/mpn-Talk-Talk-gr...

Talk Talk bought another company, Tiscali which had a web app that was publically accessible and had an unpatched mysql database. Talk Talk didn't know that the app existed or was available.

Talk Talk bought Tiscali in 2009 and the breach happened in 2015. For 6 years a web site sat on Talk Talk infrastructure and no one knew it was there or that it contained customer data. The ICO wasn't very impressed with that.

A hacker exploited a vulnerability that was fixed in 2012 with a patch made available but as no one knew the site existed, no one installed the patch in the 3 years between patch release and attack.

The database contained personal data (name, address, dob, telephone number, email address, financial information) of 156,959 customers. The hacker accessed every single record. The hacker also accessed bank account numbers and sort codes of 15,656 customers.

The ICO didn't take the view that because Talk Talk didn't know about the application that it wasn't their fault and they should have been let off lightly. A common theme is that not knowing is as bad, if not worse than knowing and not mitigating.

The mitigating factors included the fact that Talk Talk notified the ICO, they helped with the investigation, they bought monitoring software for everyone involved plus they had already started an improvement programme. Still, they lost data that people had the right to expect to be kept private,

Outcome? £400K fine

It was almost at the top £500K but not quite!

Question: Are there any old applications on your network which contain customer data which you don't even know you have? Did any of your sales team do a POC with salesforce or something, send customer data over to it or stored data in a spreadsheet on a network share somewhere? These are the things that can cause you a problem.

Sticking with Talk Talk we zoom forward in time like bill and ted on some sort of excellent data spewing adventure to October 2017 and once again Talk Talk found themselves sat down for a chat with the ICO. There are a few historical enforcements that are interesting and should really be taken note of; this certainly is one of them.

In 2002, Talk Talk created a "web portal" to monitor complaints, back in 2002 we didn't have web apps we had portals! Talk Talk outsourced the managing of complaints to Wipro in 2014, but the portal was in use since 2002, presumably internally.

The managing of these complaints was both a business need and regulatory requirement of Ofcom as part of the 2003 communications act.

Wildly speculating, if the portal was delivered in 2002 for a regulatory requirement in 2003, I would guess it was probably rushed however that is just pure speculation.

The web portal used a username/password to access a publically available url, i.e. no VPN required. 40 users at Wipro had access to between 25,000 and 50,000 users personal details at any one time. Talk Talk managed the user accounts that Wipro could use to connect to the site.

Personal data, in this case, meant name, address, telephone and Talk Talk account numbers so no DOB, mothers maiden name, bank details etc.

Wipro had the ability to search using wildcards to find up to 500 accounts at a time, they could also export the result of searches to create reports which were used to meet Talk Talk's regulatory commitments.

In 2014, 12 years after first implementation Talk Talk began receiving complaints from customers regarding scam calls from people pretending to be from Talk Talk who were able to quote users addresses and Talk Talk account number.

In September 2014, Talk Talk commisioned an investigation and alerted the ICO. The investigation found three Wipro accounts which had accessed personal data of 21,000 users. The investigation found there was not a definite link between those downloaded accounts and the scam calls, i.e. it could not be proven that it was the source of the scam calls.

Talk Talk started mitigating the issue by writing to all of its customers telling them how to deal with scam calls. Talk Talk told the ICO what happened and they responded with their own investigation and the

Outcome? £100,000 fine

The reasons were:

- The system failed to have adequate controls over who could access which records, i.e. anyone could access any record not just the cases they were working on
- The exports allowed all fields, not just the ones required for the regulatory reports
- Wipro were able to make wildcard searches
- The issue was a long-running thing from 2004 when Wipro were given access until 2014

One of the mitigating factors was that there was no evidence that this was even the source of the scam calls, plus there is no evidence anyone suffered any damage or distress as a result of this incident.

So no one was harmed, a trusted third party was given access, and there is no evidence that anything bad came out of here BUT the system wasn't very secure and was generally not built with security in mind. Remember 2003 is a few years before Microsoft had their big security push and security wasn't really even thought about. Yet being a legacy app didn't buy Talk Talk any leniency.

£100K - I know there are applications out there where trusted 3rd (and indeed internal) employees have more access than this. This shows that even though you need to trust your employees, you still must make sure they can't go rouge and just start taking what they want. Do you have a sysadmin with unfettered access to everything? of course you do, everyone does - what does the ICO think of that? Do they think 4% of turnover or a slap on the wrists?

Talk Talk total fines £501,000 (you get a discount if you pay early) if the ICO use the same percentages to apply to GDPR then that is some big numbers!

The Historical Society

This is one of the other really interesting cases because it highlights that sensitive data might not be the things we normally think about like contact details or bank account details. This is heavily redacted, the most heavily redacted I have seen an ICO enforcement action. The redaction, for me, adds to the intrigue:

https://ico.org.uk/media/action-weve-taken/mpns/1625357/mpn-historical-s...

An admin officer left a laptop that was unencrypted somewhere redacted, there was a break-in and the laptop stolen with other things. The police did not recover the laptop.

The historical society bought the laptop specifically to be taken around to multiple sites so it wasn't a desktop meant to stay in the office, it was always intended to be carried from site to site in public. I wasn't sure why this was highlighted in the report but I think it shows that the ICO expect people to think about security in everything they do all the way from purchasing to losing the laptop in a redacted.

The artifacts were stored in four locations and the officer took the laptop home so they could visit one of the locations. The laptop contained, amongst other things, a list of individuals who had donated or loaned artefacts to the Historical Society including redacted.

The ICO imposed a fine and decided it was serious because of the sensitive nature of some of the personal data that was held on the laptop and the potential consequences. I take this to mean that someone owns something they probably shouldn't and would be publically embarrassed if that information was released. No date of birth, contact details but sensitive data over what someone keeps on their study shelf.

The ICO also knew the information hadn't been made public as they would have been out in the open.

One of the interesting things that everyone must take note of is that the ICO mentions that they published guidance on their website about inadequately protected devices such as laptops and this advice was not heeded and implemented - aka if the ICO do something, do it, you can't ignore them.

I point this out as the ICO have a checklist for GDPR, and I would expect if you can't demonstrate that you know and have completed this, during an investigation, it is not going to be looked upon favourably.

The fine was only £500, but if this was a corporation then you can bet this would have been higher, the ICO is not allowed to cause anyone financial hardship, each report has something along the lines of "The commissioner has determined that the amount of fine would not cause financial hardship" - We really need to see an action after GDPR is implemented, do they drop this?

I was going to have this as a two-part but there are some more previous enforcements I want to go over and this is already over 1,500 words so I will add a part 3, coming shortly.

What have we learned from Talk Talk and the Historical Society:

- Not knowing about a system isn't a reason to not be secure
- You have to be aware of your responsibilities and any guidance the ICO issues
- Sensitive data is not just bank account details
- Security starts before purchasing and ends when you have no data

Tags: 

gdpr - panic part 1

  • Posted on: 1 February 2018
  • By: Ed Elliott

GDPR is coming (or if you are reading this in a few weeks then gdpr is here, what do you need to know and where do you start?

This post is based on how gdpr will apply to the UK. I have nothing against the EU, but in the UK it is the ICO which governs GDPR. Each country is allowed to add specific requirements, and each country does so if you are looking for how GDPR applies to another country the ICO can't help you sorry.

The ICO documentation on GDPR is pretty good. The ICO also happens to be the ones who decide what fines you get, should you get caught doing something wrong. The ICO has a history of giving greater fines to people who ignore the information and warnings they put out so if you are getting started then go here and have a read:

https://ico.org.uk/for-organisations/guide-to-the-general-data-protectio...

Once you have read that, the ICO has a great 12 step plan to get yourselves GDPR fit, use this as a guide to start your GDPR journey of enlightenment:

https://ico.org.uk/media/1624219/preparing-for-the-gdpr-12-steps.pdf

If you want to see how compliant (or otherwise) you might be then the ICO have a checklist, how do you score on this?

https://ico.org.uk/for-organisations/resources-and-support/data-protecti...

The checklist starts off by asking you whether you are a data controller or a data processor, to find out if you are either of those things (or both, or neither) see:

https://ico.org.uk/for-organisations/guide-to-the-general-data-protectio...

Once you have read all that you can find the actual GDPR regulation:

https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...

Surprisingly, it is pretty easy to read and follow the document, I have in the past read, and written patents and I assumed the GDPR authors would have used the same awful legalese. The regulation is in plain English (at least this version is after Brexit god knows what language EU docs will be written in but I guess it won't matter to the UK).

To read the article, and the actual requirements I would start at page 32 which begins "HAVE ADOPTED THIS REGULATION:" this lists each of the articles (requirements). You can go through each of these and make sure you are compliant with them.

The exciting bit, the fines

The exciting headline-grabbing parts of GDPR are the fines that can be enforced. We don't yet know how the ICO will apply the fines, words like maximum are used and the maximum possible fines are large. It is possible that the maximum fines will apply but we will look in part 2 at previous ICO enforcement actions to see if the ICO's past performance gives us any clues as to its possible future decisions.

There are two tiers of fine that the GDPR says that the ICO can impose upon a data controller or data processor. The section of the GDPR we need is:

"Article 83" - "General conditions for imposing administrative fines"

Article 83 says that while any fines should be proportionate, they should also be dissuasive, so there is a balance between not being too harsh but also being harsh enough to warn other people. It also says the amount of the fine should take into account:

- The seriousness,
- Whether it was intentional or not,
- Whether the controller or processor put in any mitigations
- How much responsibility the controller or processor took
- Any previous infringements
- The types of personal data affected
- Whether the controller or processor notified the ICO or whether they tried to hush it up
- Any other aggravating or mitigating factors.

To summarise, f you were a good data controller/processor, made a mistake and told the ICO yourselves you would potentially get a smaller fine than if you repeatedly make mistakes and try to keep it quite or deliberately do something nefarious.

Fine Tiers

The first tier which is up to 10M EUR or 2% of turnover (whichever is greater) but in the UK the ICO maximum is £9 Million or 2% of turnover. To see the things that are included in the first tier see the GDPR document and Article 83, section 4 or search for:

"Infringements of the following provisions shall, in accordance with paragraph 2, be subject to administrative fines
up to 10 000 000 EUR, or in the case of an undertaking, up to 2 % of the total worldwide annual turnover of the
preceding financial year, whichever is higher: "

To see what can trigger the larger £18 Million or 4% of turnover see Article 83, section 5, or search for:

"Infringements of the following provisions shall, in accordance with paragraph 2, be subject to administrative fines
up to 20 000 000 EUR, or in the case of an undertaking, up to 4 % of the total worldwide annual turnover of the
preceding financial year, whichever is higher: "

The ICO have published their thoughs on enforcements here:

https://www.gov.uk/government/uploads/system/uploads/attachment_data/fil...

Summary

GDPR is coming, and if you are based in the UK you need to do something about it. Personally I think GDPR is great as personal data has really been misused, I am looking forward to a serious reduction in things like me buying something from a retailer and the retailer then thinking they have the right to email and generally market to me - they don't, they never have and now hopefully the GDPR will make our data safer and actually more private.

This is part one: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-1
Part two is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/GDPR-Panic-Part-2
Part three is: https://the.agilesql.club/blogs/Ed-Elliott/2018-02-01/gdpr-panic-part-3

Tags: 

Azure ARM template function internals

  • Posted on: 18 December 2017
  • By: Ed Elliott

I have been working with azure arm functions for quite a while and they are exceptionally useful but also quite a pain to work with, visualizing what it is that the functions will resolve to is often hard.

If you haven't used an ARM template, ARM templates are JSON documents which are effectively a whole load of properties made up of a name and a value. The value can either just be a value or it can include arm template functions. If we look at this example:


"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"name": "[variables('storageAccountName')]",
"apiVersion": "2016-01-01",
"location": "[resourceGroup().location]",
"sku": {
"name": "Standard_LRS"
},
"kind": "Storage",
"properties": {}
},

The value for type, apiVersion, sku name and kind are standard values, standard hardcoded values. The value for name and location are determined at runtime - you can tell the difference because the functions are wrapped in [ and ]. In this case "[variables('storageAccountName')] just grabs a value variable. Variables are defined by a json object called "Variables". Location's [resourceGroup().location] is a little more complicated. It calls a function "resourceGroup()" which returns an object and then the value of location is used.

I originally thought that Microsoft must be using something like node or something that parsed the functions using javascript as the language was very similar to javascript, or so I thought. When I wrote my vs code extension to parse these functions so I could test them locally rather go through a full publish cycle that takes forever in azure; That is exactly what I did, implemented all the functions like 'resourceGroup' and 'variables' as javascript functions and then called 'eval' to execute them.

When I was doing this, two things really stood out as not being quite right, the first was the 'greater' function - what this does is:

"Checks whether the first value is greater than the second value."

The function takes two parameters which could be an int or a string. Now the int version is fairly simple - 100 is bigger than a 1. The string version not quite as simple, you see normally when you compare strings the language follows the ASCII table so an upper case 'A' is less than a lower case 'a' as 'A' maps to number 65 and 'a' maps to 97. In my javascript version, I use "param1 > param2" which works exactly like that but the example on the docs page works the other way around:

https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-g...

'A' is greater than 'a'

So that was the first thing, if they just used javascript to eval the code then they would need to have implemented a custom version of > and I couldn't see why you would do that.

The second thing that was odd is that what I did was implement every function such as greater and equals and when I called eval, those functions were called great but 'if' is a reserved statement in javascript so you can't just call a function 'if' (even if it is a method on a class, I tried) because it is reserved. I had to use a regex to convert all calls to 'if' to 'if_'. This worked great for me but felt a little hacky and I wasn't sure if that is what Microsoft would have done.

So, I implemented my javascript version of the functions and forgot about it for a week or so but I couldn't help think that the greater function was odd (I have only implemented the wrong version so far so I know I need to change it at some point). Anyway, after thinking about this off and on for a little while I woke up one morning and realised that I can probably find out exactly how these were implemented if I used Azure Stack. Azure Stack is a version of Azure that runs on-premise.

So, I grabbed a copy of the Azure Stack SDK and started the download, I thought it would be an installer, install some stuff and then grab the files but it turned out to be a little bit harder.

The setup for Azure Stack is basically:

- Download VHDX of Windows 2016 with a Setup folder
- Run a script that:
- Validates you meet the minimum requirements
- Creates some virtual machines
- Creates a storage cluster
- Does some other stuff

I initially grabbed started the machine and started the installation and it failed pretty quickly because I didn't have a butt load of ram on my laptop and I wasn't running on bare metal and I didn't have enough disk space. I found that these checks were pester tests so I deleted them and the install continued merrily along (albeit slowly, you need hours to install Azure Stack). I also had to enable nested virtualization on the Azure Stack VM on my laptop using:

Set-VMProcessor -VMName stacky -ExposeVirtualizationExtensions $true

Eventually, I got the install running and after a while, I noticed that there was a vhdx being attached to one of the new virtual machines that was called "CommonNugets". Being a nosy sort of a fellow I stopped the install and the Virtual Machine, mounted the original VHDX that had been created by the AzureStack installer and had a poke around until I found this nuget file:

CloudDeploymnet\NuGetStore\AzureStack.Setup.5.14.723.nupkg

This nuget file contained, drum roll please....

Content\Setup\Packages\AzureStack-ResourceManager.msi

Then I grabbed uniextract to extract the contents and I was a little shocked to not see any javascript files but instead .net dll's. This one in particular:

Microsoft.Data.Edm.dll

I have tried to search for some reference to EDM but it seems it is Microsoft internal as I cannot find any reference to it at all.

So, inside this dll, it shows that they use Antlr.Net to parse the code values in the text. I was particularly pleased that we both worked out if a value was a standard value or a template function, in the same way, basically doing:

(expression.First() == '[')) && (expression.Last() == ']'));

Now, 'greater' how is it implemented by Microsoft?

First, they check if either the first or second parameters are strings if they are then if one is a string and one is not a string then they throw an exception.

Assuming that both are strings then they use the .net string.Compare to return an int, if string.Compare returns 0 or greater then the first value is greater than the second.

"string.Compare" is implemented by "System.Globalization.CompareInfo" because as well as the two strings they also pass in "StringComparison.InvariantCulture".

If we look at https://docs.microsoft.com/en-us/dotnet/standard/globalization-localizat... we can see that "FILE" is greater than "file" when using "StringComparison.InvariantCulture" so what we have isn't a straight map to the ASCII table but the sort order that windows uses.

Once I knew they had their own parser then the whole if / if_ regex thing became clear. I guess now I can fix greater at some point - I won't rush as calling greater on a pair of strings just seems a bit odd to me and I probably won't use it but at least now I can sleep without wondering how 'greater()' was implemented.

Tags: 

xSQLServer is dead long live SqlServerDsc

  • Posted on: 15 December 2017
  • By: Ed Elliott

I have had a problem with DSC in Azure Automation for a few months now, there was a change made to the xSqlServer DSC resource which meant that the paths internally were too long to be compiled on Azure Automation, I don't even really want to think why because the fact that path lengths are an issue in 2017 (almost 2018!) makes me want to cry, so if you want to know more look at:

https://github.com/PowerShell/DscResources/issues/310

The fix was to deploy the old version 7.1.0.0 to Azure Automation and use that but yesterday when I tried to use it, it failed as well and I was about to start crying when I found out that a version 10 had been written but not deployed, except it has been deployed so I switched my dsc script form xSqlServer to SqlServerDsc and everything is now cushty.

In the changelog there is a list of breaking changes and I had to change my resources form the old style to the new doing things like changing:


xSqlServerLogin LoginThing{
SQLServer = ''
SQLInstanceName = ''
}

to


xSqlServerLogin LoginThing{
ServerName = ''
InstanceName = ''
}

but once that was done and a typo or two fixed then compiling on azure automation works again and my life is now complete.

What The Tool? Multiple choice quiz to help you choose a tool for SQL Server

  • Posted on: 14 December 2017
  • By: Ed Elliott

There are a few different choices for tools to use when it comes to SQL Server, while the "experienced" amongst us have used query analyzer, enterprise manager and probably ssms it isn't always clear what tool you should use. If you consider sql operations studio and vscode it is a literal minefield of possible options.

I was quite pleased with how my https://the.agilesql.club/WhatTheIO/ tool worked out and so I thought I would build a simple multiple choice quiz to help people decide between SSMS, SSDT, SOS, VSCode + mssql add-in:

https://the.agilesql.club/WhatTheTool/

I will probably add the command line tools and maybe some third parties if I feel so inclined.

Enjoy!

My Azure ARM template development workflow

  • Posted on: 13 December 2017
  • By: Ed Elliott

Writing infrastructure as code is pretty nice and I enjoy the declarative approach of defining what you want and letting the tooling take care of the how. I guess this is because of my background as a SQL developer.

What I don't like so much is the development experience and if you don't get this part right it can be a right pain.

What are the challenges?

The first real challenge is writing the correct json for each thing you want to create. For example, to create a virtual machine you need to know what the api version is that you should use and which properties are needed and which properties are required vs optional properties. Once you have that for a virtual machine you need to know the same details for a nic, nsg, storage account for vm diagnostics and any vm extensions you might want to include (and what properties they have).

Getting all this information together can be quite painful, there are four approaches really:

  • 1. "Automation Script" in the portal
  • 2. Viewing JSON schema docs
  • 3. Examples
  • 4. Visual Studio Deployment project

The first seems the easiest which is you deploy something then in the portal you hit the "Automation script" button which generates the arm templates and a script you can use to deploy it with.

There are two problems with this, firstly not everything is supported and so if you are deploying cutting-edge resources this won't work for you. The second issue is the way the script generator names all the resources and creates parameters for everything called something like resource_name_name_parameter if you have a fair few resources then this gets really confusing, really quickly.

The generated script is also for everything in the resource group, even if you only try to generate it for a single resource. Finally, the generated script also includes properties like status which aren't part of a deploy and just obscure the actual bits you need. If you do use the automation script option it must only really be used as a guide to get you started or to check something.

The second option is to look at the arm JSON schema's and work out what properties are supported with api version, you can find them:

https://github.com/Azure/azure-resource-manager-schemas

The third option using examples from either the Microsoft docs site or github is good but often samples do one very specific thing and you need to spend a while tweaking them. The samples are also all different and there is not a common way to name things so merging multiple together is often a pain.

The last option is the visual studio deployment project which has a wizard to add resources, this is often quite out of date and the fix suggested by Microsoft (see comments https://docs.microsoft.com/en-us/azure/azure-resource-manager/vs-azure-t...) is to just edit the json manually.

So, there really isn’t one ideal way to create arm templates the way I work is a combination of 1-3, I don’t really use visual studio because I have migrated to use vscode.

Once you have the resource versions and properties you then have two more problems, the first is that you end up with a big blob of json and navigating resources and doing things like copying and pasting and moving things about is a little risky. Visual studio has a “JSON Viewer” that is a tree viewer and lets you navigate different objects like parameters and resources but not the properties of resources so if you have a large resource you can still only navigate to the beginning of it.

The second problem is that ARM templates have their own JavaScript like language where you can have things like:

[concat(if(equals(parameters('parameterName'),1),'abc','def'),'something_else')]

If you use this sort of thing to name your objects you don’t really know if they are right until you deploy, these things can get really complicated. Both visual studio and vscode have add-ins that provide intelligence and red squiggles to show if these are right or wrong but won’t let you see if you have used the right parameter or got the final concatenation correct, the amount of times I have deployed something to see the name isn’t quite correct so I have had to redeploy is literally infuriating!

What is the solution?

The first thing is getting the correct resource versions and properties and to be honest I haven't found a better way than using a combination of the approaches. I will look at bits already deployed and use the automation script to see what it can look like and also use existing samples and the azure docs such as:

https://docs.microsoft.com/en-us/azure/templates/microsoft.compute/virtu...

If I get really stuck then I will use the template schemas.

For the tooling I was using visual studio but I was having a problem because I was using octopus deploy variables which would do a find and replace with parameters in the parameters file and this is fine for strings and numbers but when passing arrays the format I need in the parameters file means the file is invalid json and when I copied and pasted something into that file Visual Studio tried to help the invalid format and ended up messing it up so I would have to undo the changes and be very careful about changing that file.

Because of this problem with the parameters file I started using vscode and the Microsoft Azure arm extension which gave me some bugs like intelligence etc and I could edit the parameters file easily but it didn't have the JSON outliner that Visual Studio had so I back to having a blob of JSON and editing it became a pain again.

This led me to write a similar JSON outliner for vs code but because I wanted to get a better idea of what the code would look like I also evaluate any properties which have values that are code values like:

[concat(if(equals(parameters('parameterName'),1),'abc','def'),'something_else')]

Then the treeview also shows the code value and the evaluated like:

'abcsomething_else'

I also added the facility to run a template script like the one above to see what it would look like when deployed, typing in lots of if(equals((((((((( etc into a string and not knowing what it will actually do is scary, so this helps me.

To get this to work there were a couple of things I have to do, firstly we need a parameters file that we can find so I look for a file called parameters.json if you have a different file for your parameters or you have something like I have invalid JSON then you can create a dummy parameters file and point to it using the .vscode/armed.json file so when we evaluate an expression we have something we can use for the parameter value.

I also haven't implemented everything, if you use "reference()" to do something like looking up a property of a deployed resource then I don't evaluate it - I don't want to make the api call as part of an extensions which would be very slow - it is something I am thinking about implementing using something similar to the parameters file but will see if I need it or not.

There are also a couple of things like the uniqueString, I can't find the type of hash that is used so can't guarantee that it will be the same and the gt() function on strings behaves the opposite way to the way I expect it to. If you call gt on a start 'a' is normally greater than 'A' but in arm it isn't. I've implemented it the way I think it should be and using gt() with two strings is odd anyway so I wouldn't use it.

The other oddity is that I use eval to run the code so you could if you wanted to also put valid javascript in and it would work but wouldn't actually work if you deployed it, I don't know why anyone would do that though.

So I use a mixture of ways to get the resources and my armed extension to navigate templates, I display resources and properties in the window so you can navigate either resources easier. I also use the evaluate tool to see whether my code not just compiles but looks correct.

Pages

Site Search with Duck Duck Go