Application Performance Issue and Troubleshooting


Initial Checks

When SummitAI application performance issues are reported by the customer, the following activities need to be performed to check if it is a SummitAI Server/Application related issue or customer environment-related issue.

On-Premise

Access the Application directly from the SummitAI Application Server to check the performance of the Application. Access different pages, such as User Dashboard, New Incident, New SR, Incident details page, etc. 

On Cloud

Access the Application using an office network or data card to check the Application performance. 

Checklist for Application Performance Issue

If the Application is slow locally using the SummitAI office network and data card, perform the following checks:

Hardware Sizing (On-Premise Customer)

Validate if the Hardware for Application and database is configured as per the prerequisites. A lower configuration can create performance issues.

Is 32-bit application mode enabled?

Application pool is required to run on 64 bit. Hence, ensure that the 32-bit application mode is set to false.

nGEN

After the SummitAI application is installed, ensure all the application DLLs are pre-compiled using the nGEN tool. Pre-compiled DLLs execute faster than non-pre-compiled DLLs.

Caching

Ensure caching is enabled at the application server level. All the static files like images and JavaScript files will get cached at the client-side after this setting is enabled. This will ensure a faster page load.

Note:

Do not test using private browser/incognito mode browser. These modes do not use the cached data and always download the static content from the server.

Compression

Ensure Static and Dynamic compression is enabled in IIS.

Disk Space

Ensure adequate space available in the C drive for virtual memory.

CPU Utilization on Application Server

Using Task Manager (show all processes), validate the CPU w3wp.exe (IIS worker process). If the utilization of this process is consistently high for a few minutes, perform the following steps:

  1. Recycle the application pool.
  2. If utilization is still high, reset the IIS (Inform the users if other applications are running on the same IIS server).

    Note:

    Often, more than one SummitAI application components (multiple application instances, Data Collector) run on the same web server. Check the commandline column in the task manager to check the folder of the instance/dc to identify the exact application, which is causing the high CPU utilization. Recycle only the impacted application.

    Resetting the IIS will impact all the application instances. Ensure each application virtual directory is configured with a different application pool.

    If any other process other than SummitAI processes is utilizing high CPU, please contact the server administrator to take appropriate action.

Anti-Virus Blocking SummitAI Application

Add SummitAI folder in the anti-virus exclusion list so that the anti-virus system does not block/kill the SummitAI processes, thereby, impacting the performance.

Latency Between Application and DB Server

Ensure that the latency between the Application Server and the Database Server is less than 10ms. If this is not met, the performance of the Application gets highly impacted.

DB Query Timeout

Check the Application logs or DAL logs in the Application folder or Data Collector services folder to validate for query time outs. Query time outs could occur due to missing indexes, or non-performed maintenance jobs, or due to query blocks. Ensure all the maintenance activities are periodically performed. Also, avoid scheduling jobs, which use common data source parallelly.

DB Cluster Sync Issues

If the DB is configured in cluster mode and synchronous mode data syncing is set, ensure both the servers are in the same subnet and ping response between both the server is very minimal and firewall rules are not set between these servers. In the synchronous mode of data replication, unless data is not updated in all the cluster nodes, the transaction is not complete. If the servers are not in the same subnet or response time is low, set the sync mode to asynchronous. 

Periodically, run the cluster health commands to validate the cluster health.

Query Blocks

Run the following commands to identify if any queries are blocked by another query. Generally, this issue would occur when the same table is used by multiple processes simultaneously.

Note:

If you find any blocks, do not kill it unless the block is held for several minutes. In most cases, the SQL server itself clears the blocks. Kill the process only if the block is not cleared by SQL server.

declare @tbl table (SPID numeric(18,0)
,Status nvarchar(300)
,Login nvarchar(300)
,HostName nvarchar(300)
,BlkBy nvarchar(300)
,DBName nvarchar(300)
,Command nvarchar(300)
,CPUTime numeric(18,0)
,DiskIO numeric(18,0)
,LastBatch nvarchar(300)
,ProgramName nvarchar(300)
,SPID1 numeric(18,0)
,REQUESTID numeric(18,0)
)
dbcc opentran
insert into @tbl (SPID
,Status
,Login
,HostName
,BlkBy
,DBName
,Command
,CPUTime
,DiskIO
,LastBatch
,ProgramName
,SPID1
,REQUESTID
)
exec sp_who2
declare @spid int
declare @blk_by int
select dbname,status,spid,count(1) No_of_time_Parallel from @tbl group by dbname,status,spid having count(1) > 1
select top 1 @spid = spid,@blk_by = BlkBy from @tbl where dbname = 'customerdbname' and BlkBy != ' .'
select * from @tbl where dbname = customerdbname' and BlkBy != ' .'
DECLARE @sqltext VARBINARY(128)
SELECT @sqltext = sql_handle
FROM sys.sysprocesses
WHERE spid = @spid
DECLARE @sqltext1 VARBINARY(128)
SELECT @sqltext1 = sql_handle
FROM sys.sysprocesses
WHERE spid = @blk_by
SELECT TEXT process_Running
FROM sys.dm_exec_sql_text(@sqltext)
SELECT TEXT Process_blocked_by
FROM sys.dm_exec_sql_text(@sqltext1)

Use the following command to kill the blocked process if necessary:

KILL <Process ID>
Database Maintenance

If the database maintenance is not done periodically or indexes are not properly set, the Application performance would be slow and DB queries take a longer time to return the data.

Check the following table to set the DB indexes and maintenance jobs:

No.

Activity

Schedule

Procedure/Wizard-Based

Remarks

1

Rebuilding indexes

During off-peak hours

USP_SUMMIT_DB_MAINTENANCE

To be scheduled weekly.

2

Update statistics

During off-peak hours. If 24/7, UTC 3’O clock

USP_SUMMIT_DB_UPDATE_STATISTICS

To be scheduled weekly.

3

Cleanup History

9:00 PM every Saturday

Wizard-Based

To be scheduled weekly.

4

Backup Plan -

Full back up once in a week UTC 3’O clock on Sunday. Transaction log backup every 2 hours.

Wizard-Based

Transaction log to be backed up every 2 hours.

5

Shrinking log to be performed if there is no maintenance plan for Transaction log back up

Once in a week

USP_SUMMIT_DB_STANDLONE_LOG_SHRINK

To be performed if there is a shortage in storage space.

6

Check for missing index and create

To be run once in 3 months.

USP_SUMMIT_DB_CREATE_MISSING_INDEX

Analyze missing indexes and then execute the procedure

7

Remove indexes if not used for reading

To be run once in 5 months

USP_SUMMIT_DB_DROP_UNUSED_INDEX

Indexes to be removed if never used or if read per write ratio is low.

8

HDD space availability



DBAs to ensure HDD space manually.

9

Large size table to be archived



Share tables to be archived in every module periodically

10

DBCC check (Command “ DBCC CheckDB”)

To be run once in a week


This command is used to check if DB tables are corrupted.

Note:

The DB index requirement may vary from customer to customer and it is purely based on the usage pattern. Our DBA team keeps providing the updated version of maintenance guides as and when required. Always refer to the latest guide. To identify the missing indexes for a specific customer, use the SQL profiler or the following query:

SELECT CONVERT (varchar, getdate(), 126) AS runtime, mid.statement,
mig.index_group_handle, mid.index_handle,
CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact *
(migs.user_seeks + migs.user_scans)) AS improvement_measure,
'CREATE INDEX IX_'+ object_name(mid.object_id) +'_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) + '
ON ' + mid.statement + ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL
AND mid.inequality_columns IS NOT NULL
THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL ('
INCLUDE (' + mid.included_columns + ')', '')+'
GO' AS create_index_statement,
'DROP INDEX IX_'+ object_name(mid.object_id) +'_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)+'
ON '+mid.statement+'
GO' as drop_query,
migs.avg_total_user_cost,migs.*,
mid.database_id,
mid.[object_id]
FROM sys.dm_db_missing_index_groups AS mig
INNER JOIN sys.dm_db_missing_index_group_stats AS migs
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid
ON mig.index_handle = mid.index_handle
where migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) > 1000
ORDER BY mid.statement asc,migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC

Caution!

Too many indexes also can cause performance issues. Create the new indexes cautiously.

Database Memory Utilization

By design, the SQL server uses all the available memory and does not release it back to the operating system. Due to the unavailability of memory for the operating system and network-related operations, database response might be slow. It is advisable to set the memory cap for the database usage and keep some portion of memory reserved for the operating system and network.

Note:

All the database related activities defined above need to be performed on all the nodes of the cluster if the SQL cluster is enabled.

If the application is set on load balancer mode, ensure all application related configuration is set of all the nodes of the load balancer. Ideally, all the nodes of load balancer/cluster should have a similar configuration. 

If any SQL management studio console is open and some transactions were executed but not committed will lock the tables and create performance issues. Please ensure all the MMC console is closed while testing the performance.

Some customers might complain about Application slowness at a specific time of the day or regular intervals. During the other times of the day, the Application performance might be normal. In such a scenario, validate the background jobs scheduled in the Application Server or Database Server (like denormalization, SLA calculation, Asset/AVM data posting, DB maintenance, Scheduled backups, VM backup, etc.). Identify the job, which is causing this issue and take appropriate fine-tuning action.

Some specific pages like Availability view and reports etc. could be slow due to different reasons like incorrect customer queries, incorrect version of SummitAI, too many records per views, no pagination set, no archival or records, etc. These cases need to be validated on a case-to-case basis.

Checklist for On-Premise Customers

  1. Application, Proxy, DC, and Database server’s CPU and memory utilization: Check which component is occupying the resources.
  2. Recycle the application pool if the utilization is still high, and then reset the IIS (Inform the users if other applications running on same IIS sever)
  3. Check Network latency from the Application to the database and vice versa. Ping latency should not cross more than 5 milliseconds.
  4. Check in the database if any continuous blockings are available by using the following queries:
    1. sp_who2 active
    2. select * from sysprocesses where blocked >0
  5. Check the fragmentation level of all tables. If the fragmentation level is high check the maintenance job status.
  6. Check if antivirus is blocking the Application, network traffic, or disk, etc.
  7. Check for any long-running queries or jobs in the database.
  8. Check if any patching is done on SummitAI servers.

Checklist for On Cloud Customers

  1. What module(s) and pages are specifically reported as slow?
  2. Check the internet speed
    1. Access the portal "http://speedtest.net".
    2. Change the Target Server by clicking Change Server.
    3. Based on the Azure Hosting, change this location.
    4. Click Go.
    5. Capture the Latency (ping), Download and upload.
    6. Page load time plugin on the chrome browser.
  3. ​Traceroute to customer instance from Customer network.
    1. traceroute <customer instance URL without HTTP/https) or
    2. tracert <customer instance URL without HTTP/https)
  4. Ping Report to any Public IPs, such as (ex: 8.8.8.8, 4.4.4.4) from the Customer System where the issue is reported.
    1. Azure latency report from the Customer System where the issue is reported (http://www.azurespeed.com/Azure/Latency​​).
  5. Check for errors while accessing the SummitAI application.


After performing one or more activities defined in the above list, if the performance is still not up to the mark, please contact the Support or Cloud Management team for additional help/troubleshooting. If the performance of the application is good in most cases but bad for few pages/transaction types, please contact the Support or Cloud Management team.

Checklist for Customer Infrastructure

If the Application performance is good locally or from the SummitAI network or by using the data cards, the issue is related to Client infrastructure. In such a case, the customer should check with their IT and Infrastructure Support team. The following can be the possible reasons (but not limited to) for Application slowness from client networks that the customers can check:

  • Low Bandwidth: Check the bandwidth available from the affected client location to the SummitAI Server location.
  • Bandwidth Throttling/QOS configuration: Bandwidth capping or low priority might be set at the customer end for specific application or location.
  • Latency: Latency from the client location to Application Server might be very high.
  • Traffic Sniffing: Traffic sniffers may be enabled at the client location, which may slow down the Application access.
  • Firewall rules: Different firewall rules could block or slow down the Application access.
  • Network Routing: Incorrect network routing configuration can slow down the Application access.
  • LAN v/s WiFi: WiFi bandwidth is always lower than LAN and it works in the shared mode. If many users are connected to the same WiFi access point, the bandwidth allocated per computer is less. Also, the WiFi signal strength can impact LAN performance. Always ask the customer to test the performance on the LAN network.
  • Firewall setting to handle vulnerabilities: Some of the firewall settings to handle vulnerabilities like CSRF might block the caching enabled at the IIS server. This could make the Application download heavy, thereby, impacting the performance.
  • Domain controller/Authentication provider: Slowness caused by DC or any authentication provider will lead to the delay in the Application logging process.
  • Anti-Virus Blocking Summit agent: Anti-Virus software might be blocking the SummitAI Agent for Asset or ITOps modules. Ensure that the SummitAI folders are excluded in AV configuration.