Multi-Threading with PowerShell
I was provided an automation project to get data out of an external web resource so our data mangement department could build PowerBI dashboards and reports off that data. My job was simple - data extraction. This particular external web resource provides a SOAP API for data extraction. I know what you’re already thinking - SOAP is outdated and should be replaced with REST. I fully agree, however, it’s what I was provided to work with…
When I began making my API calls to get a look at the data, I was retreiving data sometimes in the hundreds of nodes, and sometimes in the 10’s of thousands. I originally began creating my PowerShell functions to process the data in a single-threaded format - as scripts generally are. Using Measure-Command
, I began to realize that this format was just too slow for what I wanted it to do.
Using my largest dataset (15k+) - I began to refactor my functions for multi-threaded processing. I knew that my code needed to be thread-safe code and work in such a way that I ensure all records are being processed and none are skipped.
First, I needed to split my large dataset into more manageable chunks for processing:
|
|
The above code splits my dataset (15,342) into as many groups as possbile, with each group having a maximum count of 3000 objects - which in this case is 6 groups. 5 groups of 3000, and one group of 342.
Next was how I tackle the processing of each group. I knew that I wanted to process each group simultaneously, but I wasn’t sure how I could - Then I came across Synchronized Hashtables.
A synchronized hashtable is a thread-safe hashtable that allows access from multiple sources, queueing each request until the current request has completed and locking the hashtable during each request (similar to a ROWLOCK in SQL).
|
|
Next was to create the RunSpaces necessary to process the groups in parallel:
|
|
Here I create a RunspacePool - a minimum of 1 Runspace, but a maximum of the total number of groups I have.
Now I process the groups between 6 threads in parallel:
|
|
This code block creates a Runspace
for each group and processes that group with the PowerShell code specified in the variable $script
. Here is an example of what is in $script
:
|
|
Learning multi-threaded processing with PowerShell was extremely enlightening and has unlocked a new ability for me and my team!