top of page

Ethical and Data Privacy Concerns Inherent in the AI/ML Space



With the advent of ChatGPT, the apocalypse has been sighted. End of the Middle Class. Destruction of white-collar jobs. Advent of the Age of Machines. All the prophets of doomerism, as Stephen Marche refers to it, rely on cynicism and fear to generate clicks and reactions. “Silicon Valley uses apocalypse for marketing purposes: they tell you their tech is going to end the world to show you how important they are.”


When you stop the flow of data, you interrupt the flow of business. Effective cybersecurity means safeguarding the flow as well as the contents of data. In terms of AI, and how that data is used to build models, the process of securing your digital assets starts at the moment of creation and then curation is the name of the game. A few points of interest are in order to understand how AI sucks in your digits, chews them up, and ultimately spits out a version of your reality, or view of the world.


First question: Who chooses the data to ingest into a model?

There are many sources of data that a company generates: documents, communications, imagery, marketing, financial, and so forth. Granting access to the databases, file repositories, and financial systems inherently opens that system up to intrusions, because once you provide an API or other access, it can be compromised. Guaranteeing security at the source is critical. It always comes down to a people problem. Trust the people and instantiate best practices such as 2FA or MFA to ensure the person accessing your data is really authorized and authenticated to use it.


But people have biases, and data must be cleansed, normalized, labeled, and in many ways massaged to be useful to machines. Unstructured learning techniques avoid a lot of these issues. However, they are less efficient and more prone to statistical errors than having humans curate the data set by preprocessing it. Companies exist to address this need, such as Defined.ai and Heartex.com. The quality of data matters, and just as ‘you are what you eat’, your ‘model is what it ingests.’ If you only feed it emails, then a skewed view of your company results.


Data has a sort of provenance to it, and this evidence can be found in the metadata, like ‘was it created by a human or a computer’ ‘when was it created’ ‘where is it housed’ and so forth. Each element, while able to be faked, adds up to a sort of fingerprint or watermark, an information trail as it were.


Second question: What methodology is used to produce the model

There are many approaches to model generation: Supervised Learning, Unsupervised Learning, Reinforcement Learning, and Deep Learning, to name the most popular. You may also have heard of techniques like ‘zero-shot’ and ‘single-shot’ models.


Each method is an approach to statistically evaluating the data from the first issue of data above. How and why you pick one approach over another is purely based on the results you are looking for. Business applications and solutions to process bottlenecks require differing perspectives to achieve improved outcomes.


These techniques are fundamentally descriptive and focus on what humans actually have done. They do not handle the ambiguity of what we should do, or negation, what we shouldn’t do. Consider the method used to train an AI for identifying which recruits to interview for a new hire. If the data set is skewed and then the method is weighted toward a desired outcome, the decision-making of the machine automatically throws out non-statistically significant resumes. There is no way to argue against it. Valid candidates don’t even get a rejection email. Classifications can be reviewed, but who’s to say it’s the right set of classifiers in the first place? The explainability of your method is just as critical to the process as is the data. Values such as beneficence, non-maleficence, autonomy, accountability, responsibility, and privacy are also key dimensions.


Don’t exaggerate what your algorithm or model can do. Don’t speculate that it delivers fair or unbiased results. Your statements must be backed up by evidence and this is where we get to the next issue, one of quality and veracity.


Third question: How is the model tested and validated

As you develop and embrace the new world of automation, AI, and chatbot assistants, transparency will go a long way to build confidence in your user base. You don’t want your AI implementation to be a black box, where the ends justify the means. If we’re not careful, companies will delegate activities and decision-making to bots because they can get away with everything that we can’t. And this evasion of responsibility ends up becoming no more than a legal dodge, a sort of ‘human behavior laundering’ in the guise of technical advances. Who’s to blame? You can’t point to a single person, the machine’s doing the work.

While the alphabet agencies are developing frameworks, and talking compliance and certification, there are still basic steps to testing, validation, and certification you can take. Obvious proactive measures are:

· Independent audits

· Legal reviews

· Opening your data or source code to inspection by 3rd parties


Quality control of the data input lays the foundation for verification of the output. Does the results match the expected area of information and standard responses a reasonable person would accept? Are statistical measures within tolerances? All of this implies that you have established acceptance criteria and KPIs in advance, expectations that support your business goals.


In testing, skepticism about the data and the output is your best tool. Deep fakes are an increasing problem. Confirmation bias creates a reinforcement feedback loop that increases the willingness of some to believe whatever a machine creates. ‘The data says so’ ‘that’s what the data shows’ are no longer strong arguments. Corroboration of information means human curation at some point. Quality control is that step.


Fourth question: Who ultimately owns the output

And is therefore legally responsible for its impact. Lawyers are already being hired to represent AI chatbots. This is an attempt to displace responsibility away from the humans behind the curtain. In effect, the developers and the companies who employ them are trying to say, ‘hands up, we can’t be held responsible for discrimination, it’s an algorithm.’

Logic dictates that we look at this attitude. Common sense says ‘the creator is greater than the creation.’ When a human encodes instructions and gives direction to model, they are essentially the puppet masters controlling the strings. A computer sifts through data, looking for patterns. Who told it which patterns were relevant and which patterns to ignore? The humans who wrote the algorithms and code obviously. The teams that create, curate, process and quality control the data should be held responsible for the outcome of their creation.


The next question that immediately comes to mind is ‘who then owns the output?’ Some companies are arguing that the end use is the legal owner. OpenAI’s terms assign the right, title and interest in output to a user. Legal experts are cautionary in stating that it may be a nice position to take, but large corporations are wont to change their minds retroactively. Additional criteria may include ‘independent intellectual effort’ (perhaps demonstrated by prompt creation), ‘originality’ of authorship, and so forth. None of these are inherent in AI-generated output.


If you’d like to learn more about AskRadar’s Knowledge Engagement services, contact us at Sales@AskRadar.ai.


About AskRadar.ai

We believe that people are the key to solving complex problems.


With pinpoint accuracy, Radar connects you with the right expertise, right now, to answer complex internal questions, because complex problems don’t get solved with chatbot answers or crowdsourced chaos.


Radar creates intelligent connections through a combination of computational linguistics, A.I. models, and human intelligence. The result is increased productivity and accelerated operational velocity, with drastically reduced interruptions from those Slack attacks and email blasts. And, when a question has been asked more than once, Radar serves up the most recent relevant expert answer, getting rid of fruitless searches for information.


Radar’s Dynamic Brain learns from every interaction, ingesting conversational data, and gets smarter every day.


 

About the Author

Sharon Bolding is the CTO of AskRadar.ai, an A.I.-powered Enterprise SaaS company. She is a serial entrepreneur, with experience in AI, SaaS, FinTech, and Cybersecurity. With two successful exits of her own, she is a trusted advisor to startups and growing companies. An invited speaker at universities and tech conferences, she focuses on educating users about the ethical use of their data and how AI impacts privacy and security.


Comentários


bottom of page