Back to the main data prep VBA page
I work within development organizations helping to tune high scale workloads for our large scale commercial customer base. These customers typically deploy in a variety of environments that include many different database technologies, often ones not directly owned by the company I work for. Since our application is heavily dependent on the performance of the underlying database, I’ve worked up many reference scripts used to parse performance data provided by the different databases in use so I can create a time series view of key performance metrics from the RDBMS.
In this case I have presented a simple script that parses a directory full of Oracle Statspack text files that contain performance data on regular snap intervals. These files contain a wealth of performance data that is often used as a single report that spans multiple hours. Being a performance guy I want a time series view of this data that uses smaller snap intervals and tracks key individual metrics over time so I can correlate any application tier issues to specific database metrics captured by Oracle Statspack.
The following script will create a single pipe (“|”) delimited output file that contains a row per report with the key sections of the Oracle Statspack output I am interested in. This script can easily be adapted to target other sections of Oracle Statspack output, but there are a few interesting parsing examples that can be altered as needed. Here’s the script in it’s entirety:
Sub importAWRtext() 'June 2014 'routine to import text based AWR snaps 'logic allows for multi-node imports on RAC environments 'point to a directory and import all reports 'set variables Dim rawLOG As String, rawLOGpath As String Dim splitArray() As String, StrOut As String Dim awrTextArr() As Variant Dim lineCt As Double pathFull = Sheets("input").Range("C13") Set objrawLOG = CreateObject("Scripting.FileSystemObject") rawLOGpath = objrawLOG.GetFile(pathFull).ParentFolder.path & "\" 'setup clean files srcCSV = Dir(objrawLOG.GetFile(pathFull).ParentFolder.path & "\") Do While srcCSV <> "" tempFILE = rawLOGpath & "\" & srcCSV & "" If srcCSV Like "csvAWR*" Then Set delFILE = CreateObject("Scripting.FileSystemObject") delFILE.Deletefile tempFILE, True End If srcCSV = Dir Loop fnameLOGclean = "csvAWRclean.out" Set objFSO = CreateObject("Scripting.FileSystemObject") Set objCLEAN = objFSO.CreateTextFile(rawLOGpath & "\" & fnameLOGclean, 1) 'build array of log names up front, use those as source Dim logNameArr(10000, 1) As Variant rawLOG = Dir(rawLOGpath) logCT = 0: logSO = 0 Do While rawLOG <> "" If rawLOG Like "*.txt" Then logCT = logCT + 1 logNameArr(logCT, 1) = rawLOG End If rawLOG = Dir Loop 'setup array for caching results ReDim awrTextArr(logCT + 1, 1) As Variant 'loop through statspack files For logLoop = 1 To logCT lineCt = 0: curStr = "" rawLOG = logNameArr(logLoop, 1) srcFullPath = rawLOGpath & rawLOG Set objFSO = CreateObject("Scripting.FileSystemObject") Set objrawLOG = objFSO.OpenTextFile(rawLOGpath & rawLOG, 1) Do Until objrawLOG.AtEndOfStream txnstring = objrawLOG.ReadLine lineCt = lineCt + 1 'DB info If txnstring Like "DB Name*" Then curStr = "DBname": tgtLine = lineCt + 2 If tgtLine = lineCt And curStr = "DBname" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(0) & "|" & splitArray(2) & "|" _ & splitArray(6) & "|" & splitArray(7) & "|" curStr = "" headerLine = headerLine & "dbname|instance|release|RAC|" End If 'Host info If txnstring Like "Host Name*" Then curStr = "HOSTname": tgtLine = lineCt + 2 If tgtLine = lineCt And curStr = "HOSTname" Then StrOut = StrOut & Trim(Left(txnstring, 16)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 49), 32)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 54), 4)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 60), 4)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 68), 7)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 80), 10)) & "|" curStr = "" headerLine = headerLine & "host name|platform|CPUs|Cores|Sockets|Memory(GB)|" End If 'unique strings If txnstring Like "Begin Snap:*" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(2) & "|" & splitArray(3) & " " & splitArray(4) & "|" headerLine = headerLine & "begin snap ID|begin time|" End If If txnstring Like " End Snap:*" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(2) & "|" & splitArray(3) & " " & splitArray(4) & "|" headerLine = headerLine & "end snap ID|end time|" End If If txnstring Like " Buffer Cache:*" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(2) & "|" & splitArray(7) & "|" headerLine = headerLine & "buffer cache|block size|" End If If txnstring Like " Logical reads:*" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(2) & "|" headerLine = headerLine & "lrps|" End If If txnstring Like " Physical reads:*" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(2) & "|" headerLine = headerLine & "prps|" End If If txnstring Like "*Buffer Hit %:*" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(3) & "|" headerLine = headerLine & "BCHR|" End If 'CPU utilization If txnstring Like "*Host CPU (CPUs*" Then curStr = "CPUutil": tgtLine = lineCt + 4 If tgtLine = lineCt And curStr = "CPUutil" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(2) & "|" & splitArray(3) & _ "|" & splitArray(4) & "|" & splitArray(5) & "|" curStr = "": tgtLine = 0 headerLine = headerLine & "%user|%system|%wio|%idle|" End If 'Top 5 timed events If txnstring Like "Top 5 Timed Foreground Events*" Then curStr = "top5": tgtLine = lineCt + 6 If tgtLine = lineCt And curStr = "top5" Then StrOut = StrOut & Trim(Left(txnstring, 30)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 43), 12)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 55), 11)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 62), 6)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 69), 6)) & "|" tgtLine = tgtLine + 1: loopInc = loopInc + 1 headerLine = headerLine & "event" & loopInc & "|waits|time(s)|avg wait(ms)|%DB time|" If loopInc = 5 Then curStr = "": loopInc = 0: tgtLine = 0 End If 'top 3 SQL by elapsed time If txnstring Like "*SQL ordered by Elapsed Time*" Then curStr = "SQLbyTime" If curStr = "SQLbyTime" And loopInc = 0 And txnstring Like "----------*" Then tgtLine = lineCt + 1 If curStr = "SQLbyTime" And loopInc > 0 And Trim(txnstring) = "" Then tgtLine = lineCt + 1 If tgtLine = lineCt And curStr = "SQLbyTime" And loopInc < 4 Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(6) & "|" & splitArray(0) & "|" & splitArray(1) & "|" & _ splitArray(2) & "|" & splitArray(3) & "|" loopInc = loopInc + 1 headerLine = headerLine & "elapsed" & loopInc & " sqlID|elapsed(sec)|execs|ms/exec|%total|" End If If txnstring Like "*SQL ordered by CPU Time*" Then loopInc = 0: tgtLine = 0: curStr = "" 'top 3 SQL by Gets If txnstring Like "*SQL ordered by Gets*" Then curStr = "SQLbyGets" If curStr = "SQLbyGets" And loopInc = 0 And txnstring Like "----------*" Then tgtLine = lineCt + 1 If curStr = "SQLbyGets" And loopInc > 0 And Trim(txnstring) = "" Then tgtLine = lineCt + 1 If tgtLine = lineCt And curStr = "SQLbyGets" And loopInc < 4 Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(7) & "|" & splitArray(0) & "|" & splitArray(1) & "|" & _ splitArray(2) & "|" & splitArray(3) & "|" loopInc = loopInc + 1 headerLine = headerLine & "gets" & loopInc & " sqlID|gets|execs|gets/exec|%total|" End If If txnstring Like "*SQL ordered by Reads*" Then loopInc = 0: tgtLine = 0: curStr = "" 'Segments by parser 'there are multiple similarly structured tables in the segments by section 'they are variable in length, and some have an extra line in the table header 'this section sets up the target line start and the custom header variables If txnstring Like "*Segments by Logical Reads*" Then curStr = "SegByLog": tgtLine = lineCt + 7: testEmpty = lineCt + 2 header1 = "segs by log" header2 = "|name|object|type|logical reads|%total|" End If If txnstring Like "*Segments by Physical Reads*" Then curStr = "SegByPhys": tgtLine = lineCt + 7: testEmpty = lineCt + 2 header1 = "segs by phys" header2 = "|name|object|type|physical reads|%total|" End If If txnstring Like "*Segments by Physical Writes*" Then curStr = "SegByPhysW": tgtLine = lineCt + 7: testEmpty = lineCt + 2 header1 = "segs by writes" header2 = "|name|object|type|physical writes|%total|" End If If txnstring Like "*Segments by Table Scans*" Then curStr = "SegByScans": tgtLine = lineCt + 7: testEmpty = lineCt + 2 header1 = "segs by scans" header2 = "|name|object|type|table scans|%total|" End If If txnstring Like "*Segments by Row Lock Waits *" Then curStr = "SegByLocks": tgtLine = lineCt + 8: testEmpty = lineCt + 2 header1 = "segs by rlwait" header2 = "|name|object|type|row lock waits|%total|" End If If curStr Like "SegBy*" And lineCt = testEmpty And txnstring Like " No data exists for this section of the report.*" Then For t = 1 To 5 StrOut = StrOut & "||||||" headerLine = headerLine & header1 & loopInc & header2 Next t curStr = "": loopInc = 0: tgtLine = 0 End If 'execute "segments by" parsing logic 'since the table is variable length, look for the end line and skip further processing If tgtLine = lineCt And curStr Like "SegBy*" Then If txnstring Like " ------------------------------*" Then For t = loopInc + 1 To 5 StrOut = StrOut & "||||||" headerLine = headerLine & header1 & loopInc & header2 Next t curStr = "": loopInc = 0: tgtLine = 0 Else StrOut = StrOut & Trim(Left(txnstring, 11)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 21), 10)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 42), 20)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 59), 5)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 72), 12)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 80), 7)) & "|" tgtLine = tgtLine + 1: loopInc = loopInc + 1 headerLine = headerLine & header1 & loopInc & header2 If loopInc = 5 Then curStr = "": loopInc = 0: tgtLine = 0 End If End If Loop 'record output string 'caching to an array so that I can build out a custom header If logLoop = 1 Then awrTextArr(1, 1) = headerLine awrTextArr(logLoop + 1, 1) = StrOut StrOut = "": headerLine = "" Next logLoop 'loop through array, write output to flie For i = 1 To logCT + 1 objCLEAN.WriteLine awrTextArr(i, 1) Next i 'clean up objrawLOG.Close End Sub
I fully admit I cheated and using the worksheet TRIM function in a few cases to collapse consecutive spaces for easier splitting on variable length fields. There are likely a few ways to solve this by walking the string and counting sections, but TRIM was easier. I may revisit some of my brute force approaches to relying on fixed field length parsing.
So far I have used this on 11.2.0.3.0 output. We’ll see if it holds up as other samples from different Oracle versions come in. I’d been putting off writing this script for a few years, focusing my initial efforts on the Oracle AWR output, but I finally broke down and wrote this script after yet another RAC cluster set of reports came in (24 x 1 hour samples across a 4 node cluster for a total of 96 reports…).
Parsing data from a single line in Oracle Statspack
The simplest use case is to pull data from multiple columns in a single line. In this case, I used the worksheet TRIM function to eliminate extra spaces in the variable length lines, followed by the split function using a space as a delimiter. Then I pulled the specific “column” of interest:
If txnstring Like "*Buffer Hit %:*" Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(3) & "|" headerLine = headerLine & "BCHR|" End If
Parsing data from a fixed row count Oracle Statspack table
There are tables within Oracle Statspack that have a fixed number of entries. An example of this is the “top 5 timed events” that will always have 5 entries in a production system. I have not run across a case where there are not five entries, so we can simplify the parsing of this table by simply keeping a count of the lines that have been processed.
'Top 5 timed events If txnstring Like "Top 5 Timed Foreground Events*" Then curStr = "top5": tgtLine = lineCt + 6 If tgtLine = lineCt And curStr = "top5" Then StrOut = StrOut & Trim(Left(txnstring, 30)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 43), 12)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 55), 11)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 62), 6)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 69), 6)) & "|" tgtLine = tgtLine + 1: loopInc = loopInc + 1 headerLine = headerLine & "event" & loopInc & "|waits|time(s)|avg wait(ms)|%DB time|" If loopInc = 5 Then curStr = "": loopInc = 0: tgtLine = 0 End If
You can see that I used a fixed format parsing method for counting character positions based on the expected table layout. We’ll see if they modify the column assignments by release 🙂
Parsing data from a variable length Oracle Statspack table
This is a section of the script where I pulled the top three SQL queries ordered by buffer gets. There are two if statements that skip lines that should be ignored. The parsing will run until the next table is encountered (SQL by Reads):
'top 3 SQL by Gets If txnstring Like "*SQL ordered by Gets*" Then curStr = "SQLbyGets" If curStr = "SQLbyGets" And loopInc = 0 And txnstring Like "----------*" Then tgtLine = lineCt + 1 If curStr = "SQLbyGets" And loopInc > 0 And Trim(txnstring) = "" Then tgtLine = lineCt + 1 If tgtLine = lineCt And curStr = "SQLbyGets" And loopInc < 4 Then tempStr = Application.WorksheetFunction.Trim(txnstring) splitArray() = Split(tempStr, " ") StrOut = StrOut & splitArray(7) & "|" & splitArray(0) & "|" & splitArray(1) & "|" & _ splitArray(2) & "|" & splitArray(3) & "|" loopInc = loopInc + 1 headerLine = headerLine & "gets" & loopInc & " sqlID|gets|execs|gets/exec|%total|" End If If txnstring Like "*SQL ordered by Reads*" Then loopInc = 0: tgtLine = 0: curStr = ""
Parsing a variable length Oracle Statspack tables with the same base format
I have a section in the script that uses the same parsing method for multiple tables that have the same fixed format dimensions. These “Segment by” tables have the same column counts but different headers. The complicating factor is in some cases, these tables contain no data. I have to trap that condition and maintain the output string dimensions:
'Segments by parser 'there are multiple similarly structured tables in the segments by section 'they are variable in length, and some have an extra line in the table header 'this section sets up the target line start and the custom header variables If txnstring Like "*Segments by Logical Reads*" Then curStr = "SegByLog": tgtLine = lineCt + 7: testEmpty = lineCt + 2 header1 = "segs by log" header2 = "|name|object|type|logical reads|%total|" End If If txnstring Like "*Segments by Physical Reads*" Then curStr = "SegByPhys": tgtLine = lineCt + 7: testEmpty = lineCt + 2 header1 = "segs by phys" header2 = "|name|object|type|physical reads|%total|" End If If txnstring Like "*Segments by Physical Writes*" Then curStr = "SegByPhysW": tgtLine = lineCt + 7: testEmpty = lineCt + 2 header1 = "segs by writes" header2 = "|name|object|type|physical writes|%total|" End If If txnstring Like "*Segments by Table Scans*" Then curStr = "SegByScans": tgtLine = lineCt + 7: testEmpty = lineCt + 2 header1 = "segs by scans" header2 = "|name|object|type|table scans|%total|" End If If txnstring Like "*Segments by Row Lock Waits *" Then curStr = "SegByLocks": tgtLine = lineCt + 8: testEmpty = lineCt + 2 header1 = "segs by rlwait" header2 = "|name|object|type|row lock waits|%total|" End If If curStr Like "SegBy*" And lineCt = testEmpty And txnstring Like " No data exists for this section of the report.*" Then For t = 1 To 5 StrOut = StrOut & "||||||" headerLine = headerLine & header1 & loopInc & header2 Next t curStr = "": loopInc = 0: tgtLine = 0 End If
The prior section sets the type of table that is being parsed with the proper identifiers and header strings. It also contains the logic to look for the tables that contain no data. Once the type of table is established, the tables are parsed and the content is appended to the output string:
'execute "segments by" parsing logic 'since the table is variable length, look for the end line and skip further processing If tgtLine = lineCt And curStr Like "SegBy*" Then If txnstring Like " ------------------------------*" Then For t = loopInc + 1 To 5 StrOut = StrOut & "||||||" headerLine = headerLine & header1 & loopInc & header2 Next t curStr = "": loopInc = 0: tgtLine = 0 Else StrOut = StrOut & Trim(Left(txnstring, 11)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 21), 10)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 42), 20)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 59), 5)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 72), 12)) & "|" StrOut = StrOut & Trim(Right(Left(txnstring, 80), 7)) & "|" tgtLine = tgtLine + 1: loopInc = loopInc + 1 headerLine = headerLine & header1 & loopInc & header2 If loopInc = 5 Then curStr = "": loopInc = 0: tgtLine = 0 End If End If