Oracle AWR报告指标全解析【性能调优】Oracle AWR报告指标全解析开Oracle调优鹰眼,深入理解AWR性能报告:http: //www. Oracle调优鹰眼,深入理解AWR性能报告 第二讲: http: //www. QQ号: 4. 70. 79. 56. Oracle调优鹰眼,深入理解AWR性能报告》的教学视频后急切期待第三讲,但实际是第三讲需要结合大量的原理知识才能充分理解 例如Latch activity 、Undo、Dynamic Resource Master均需要理解其原理才能充分理解。 所以这些AWR的环节将在 Maclean 今后的 系列调优讲座中介绍。 对于《Oracle调优鹰眼系列》 则会增加本附录,作为对全部Oracle AWR指标的介绍, 本附录对于原理理解方面的内容将不多,而更侧重于指标含义的介绍,是对AWR鹰眼讲座的工具文档。如果你觉得本AWR解析中的哪些指标仍理解不透彻 或者讲的不清楚的,可以在本页中留言,谢谢大家的支持。Hawk Eyes 看AWR的鹰眼= 基础理论夯实+看过5. AWR啥是AWR?=====================================================================================================AWR (Automatic Workload Repository)一堆历史性能数据,放在SYSAUX表空间上, AWR和SYSAUX都是1. Burleson Consulting is an altruistic company and we believe in sharing our Oracle extensive knowledge through publishing Oracle books and Oracle articles, video.Oracle调优的关键特性; 大约1. DBMS_WORKLOAD_REPOSITORY. MODIFY_SNAPSHOT_SETTINGS修改DBA_HIST_WR_CONTROLAWR程序核心是dbms_workload_repository包@?/rdbms/admin/awrrpt 本实例@?/rdbms/admin/awrrpti RAC中选择实例号谁维护AWR?主要是MMON(Manageability Monitor Process)和它的小工进程(m. MMON的功能包括: 1.启动slave进程m.
Performance tuning has been always an important task in Oracle systems. Oracle databases, Oracle applications, and even Oracle engineered systems work better in terms. AWR快照2.当某个度量阀值被超过时发出alert告警3.为最近改变过的SQL对象捕获指标信息AWR小技巧手动执行一个快照:Exec dbms_workload_repository. J!)创建一个AWR基线Exec DBMS_WORKLOAD_REPOSITORY. CREATE_BASELINE(start_snap_id,end_snap_id ,baseline_name); @?/rdbms/admin/awrddrpt AWR比对报告@?/rdbms/admin/awrgrpt RAC 全局AWR自动生成AWR HTML报告:http: //www. WORKLOAD REPOSITORY report for. DB Name DB Id Instance Inst Num Startup Time Release RAC. MAC 2. 62. 96. Jan- 1. 3 1. 6: 4. YES. Host Name Platform CPUs Cores Sockets Memory(GB). MAC1. 0 AIX- Based Systems (6. Snap Id Snap Time Sessions Curs/Sess. Begin Snap: 5. 85. Jan- 1. 3 1. 5: 0. End Snap: 5. 85. Jan- 1. 3 1. 5: 3. Elapsed: 2. DB Time: 7,6. Elapsed 为该AWR性能报告的时间跨度(自然时间的跨度,例如前一个快照snapshot是4点生成的,后一个快照snapshot是6点生成的,则若使用@?/rdbms/admin/awrrpt 脚本中指定这2个快照的话,那么其elapsed = (6- 4)=2 个小时),一个AWR性能报告 至少需要2个AWR snapshot性能快照才能生成 ( 注意这2个快照时间 实例不能重启过,否则指定这2个快照生成AWR性能报告 会报错),AWR性能报告中的 指标往往是 后一个快照和前一个快照的 指标的delta,这是因为 累计值并不能反映某段时间内的系统workload。DB TIME= 所有前台session花费在database调用上的总和时间:注意是前台进程foreground sessions包括CPU时间、IO Time、和其他一系列非空闲等待时间,别忘了cpu on queue time. DB TIME 不等于 响应时间,DB TIME高了未必响应慢,DB TIME低了未必响应快DB Time描绘了数据库总体负载,但要和elapsed time逝去时间结合其他来。Average Active Session AAS= DB time/Elapsed Time. DB Time =6. 0 min , Elapsed Time =6. AAS=6. 0/6. 0=1 负载一般DB Time= 1min , Elapsed Time= 6. AAS= 1/6. 0 负载很轻DB Time= 6. Elapsed Time= 6. 0 min AAS=1. DB TIME= DB CPU + Non- Idle Wait + Wait on CPU queue如果仅有2个逻辑CPU,而2个session在6. CPU上,那么:DB CPU= 2 * 6. DB Time = 2* 6. 0 + 0 + 0 =1. AAS = 1. 20/6. 0=2 正好等于OS load 2。如果有3个session都1. CPU,那么总有一个要wait on queue. DB CPU = 2* 6. 0 mins ,wait on CPU queue= 6. AAS= (1. 20+ 6. 0)/6. DB Cpu = xx mins , Non- Idle Wait= enq: TX + cursor pin S on X + latch : xxx + db file sequential read + ………. Cache Sizes Begin End. Buffer Cache: 4. M 4. 9,1. 52. M Std Block Size: 8. K. Shared Pool Size: 1. M 1. 3,3. 12. M Log Buffer: 3. K内存管理方式:MSMM、ASMM(sga_target)、AMM(memory_target)小内存有小内存的问题, 大内存有大内存的麻烦! ORA- 0. Buffer cache和shared pool size的 begin/end值在ASMM、AMM和1. R2 MSMM下可是会动的哦!这里说 shared pool一直收缩,则在shrink过程中一些row cache 对象被lock住可能导致前台row cache lock等解析等待,最好别让shared pool shrink。如果这里shared pool一直在grow,那说明shared pool原有大小不足以满足需求(可能是大量硬解析),结合下文的解析信息和SGA breakdown来一起诊断问题。1- 2 Load Profile. Load Profile Per Second Per Transaction Per Exec Per Call. DB Time(s): 2. DB CPU(s): 3. Redo size: 1,0. Logical reads: 1. Block changes: 6,3. Physical reads: 5,0. Physical writes: 3. User calls: 1. Parses: 2. 04. Hard parses: 0. W/A MB processed: 5. Logons: 1. Executes: 3,9. Rollbacks: 1,1. Transactions: 1,2. Blocks changed per Read: 5. Recursive Call %: 9. Rollback per transaction %: 3. Rows per Sort: 7. I/O压力, Per Transaction可以用来分辨是 大量小事务, 还是少量大事务。如上例每秒redo 约1. MB ,每个事务8. 00 字节,符合OLTP特征Logical Read单位 次数*块数, 相当于 “人*次”, 如上例 1. MB/s , 逻辑读耗CPU,主频和CPU核数都很重要,逻辑读高则DB CPU往往高,也往往可以看到latch: cache buffer chains等待。 大量OLTP系统(例如siebel)可以高达几十乃至上百Gbytes。Block changes单位 次数*块数 , 描绘数据变化频率Physical Read单位次数*块数, 如上例 5. MB/s, 物理读消耗IO读,体现在IOPS和吞吐量等不同纬度上;但减少物理读可能意味着消耗更多CPU。好的存储 每秒物理读能力达到几GB,例如Exadata。 这个physical read包含了physical reads cache和physical reads direct. Physical writes单位 次数*块数,主要是DBWR写datafile,也有direct path write。 dbwr长期写出慢会导致定期log file switch(checkpoint no complete) 检查点无法完成的前台等待。 这个physical write 包含了physical writes direct +physical writes from cache. User Calls单位次数,用户调用数,more details from internal. Parses解析次数,包括软解析+硬解析,软解析优化得不好,则夸张地说几乎等于每秒SQL执行次数。 即执行解析比1: 1,而我们希望的是 解析一次 到处运行哦!Hard Parses万恶之源. Cursor pin s on X, library cache: mutex X , latch: row cache objects /shared pool…………….。 硬解析最好少于每秒2. W/A MB processed单位MB W/A workarea workarea中处理的数据数量结合 In- memory Sort%, sorts (disk) PGA Aggr一起看Logons登陆次数, logon storm 登陆风暴,结合AUDIT审计数据一起看。短连接的附带效应是游标缓存无用Executes执行次数,反应执行频率Rollback回滚次数, 反应回滚频率, 但是这个指标不太精确,参考而已,别太当真Transactions每秒事务数,是数据库层的TPS,可以看做压力测试或比对性能时的一个指标,孤立看无意义% Blocks changed per Read每次逻辑读导致数据块变化的比率;如果’redo size’, ‘block changes’ ‘pct of blocks changed per read’三个指标都很高,则说明系统正执行大量insert/update/delete; pct of blocks changed per read = (block changes ) /( logical reads)Recursive Call %递归调用的比率; Recursive Call % = (recursive calls)/(user calls)Rollback per transaction %事务回滚比率。 Rollback per transaction %= (rollback)/(transactions)Rows per Sort平均每次排序涉及到的行数 ; Rows per Sort= ( sorts(rows) ) / ( sorts(disk) + sorts(memory))注意这些Load Profile 负载指标 在本环节提供了 2个维度 per second 和 per transaction。per Second: 主要是把 快照内的delta值除以 快站时间的秒数 , 例如 在 A快照中V$SYSSTAT视图反应 table scans (long tables) 这个指标是 1. B快照中V$SYSSTAT视图反应 table scans (long tables) 这个指标是 3. A快照和B快照 之间 间隔了一个小时 3. Second是我们审视数据的主要维度 ,任何性能数据脱离了 时间模型则毫无意义。在statspack/AWR出现之前 的调优 洪荒时代, 有很多DBA 依赖 V$SYSSTAT等视图中的累计 统计信息来调优,以当前的调优眼光来看,那无异于刀耕火种。per transaction : 基于事务的维度, 与per second相比 是把除数从时间的秒数改为了该段时间内的事务数。 这个维度的很大用户是用来 识别应用特性的变化 ,若2个AWR性能报告中该维度指标 出现了大幅变化,例如 redo size从本来per transaction 1k变化为 1. SQL业务逻辑肯定发生了某些变化。注意AWR中的这些指标 并不仅仅用来孤立地了解 Oracle数据库负载情况, 实施调优工作。 对于 故障诊断 例如HANG、Crash等, 完全可以通过对比问题时段的性能报告和常规时间来对比,通过各项指标的对比往往可以找出 病灶所在。SELECT VALUE FROM DBA_HIST_SYSSTAT WHERE SNAP_ID = : B4 AND DBID = : B3 AND INSTANCE_NUMBER = : B2 AND STAT_NAME in ( "db block changes","user calls","user rollbacks","user commits",redo size","physical reads direct","physical writes","parse count (hard)","parse count (total)","session logical reads","recursive calls","redo log space requests","redo entries","sorts (memory)","sorts (disk)","sorts (rows)","logons cumulative","parse time cpu","parse time elapsed","execute count","logons current","opened cursors current","DBWR fusion writes","gcs messages sent","ges messages sent","global enqueue gets sync","global enqueue get time","gc cr blocks received","gc cr block receive time","gc current blocks received","gc current block receive time","gc cr blocks served","gc cr block build time","gc cr block flush time","gc cr block send time","gc current blocks served","gc current block pin time","gc current block flush time","gc current block send time","physical reads","physical reads direct (lob)". SELECT TOTAL_WAITS FROM DBA_HIST_SYSTEM_EVENT WHERE SNAP_ID = : B4 AND DBID = : B3 AND INSTANCE_NUMBER = : B2 AND EVENT_NAME in ("gc buffer busy","buffer busy waits". SELECT VALUE FROM DBA_HIST_SYS_TIME_MODEL WHERE DBID = : B4 AND SNAP_ID = : B3 AND INSTANCE_NUMBER = : B2 AND STAT_NAME in ("DB CPU","sql execute elapsed time","DB time". SELECT VALUE FROM DBA_HIST_PARAMETER WHERE SNAP_ID = : B4 AND DBID = : B3 AND INSTANCE_NUMBER = : B2 AND PARAMETER_NAME in ("__db_cache_size","__shared_pool_size","sga_target","pga_aggregate_target","undo_management","db_block_size","log_buffer","timed_statistics","statistics_level". SELECT BYTES FROM DBA_HIST_SGASTAT WHERE SNAP_ID = : B4 AND DBID = : B3 AND INSTANCE_NUMBER = : B2 AND POOL IN ('shared pool', 'all pools') AND NAME in ("free memory". SELECT BYTES FROM DBA_HIST_SGASTAT WHERE SNAP_ID = : B4 AND DBID = : B3 AND INSTANCE_NUMBER = : B2 AND NAME = : B1 AND POOL IS NULL. SELECT (E. BYTES_PROCESSED - B. BYTES_PROCESSED) FROM DBA_HIST_PGA_TARGET_ADVICE B, DBA_HIST_PGA_TARGET_ADVICE E WHERE B. DBID = : B4 AND B. SNAP_ID = : B3 AND B. INSTANCE_NUM. BER = : B2 AND B. ADVICE_STATUS = 'ON' AND E. DBID = B. DBID AND E. SNAP_ID = : B1 AND E. INSTANCE_NUMBER = B. INSTANCE_NUMBER AND E. PGA_TARGET_FACTOR = 1 AND B. PGA_TARGET_FACT. OR = 1 AND E. ADVICE_STATUS = 'ON'. SELECT SUM(E. TOTAL_WAITS - NVL(B. TOTAL_WAITS, 0)) FROM DBA_HIST_SYSTEM_EVENT B, DBA_HIST_SYSTEM_EVENT E WHERE B. SNAP_ID(+) = : B4 AND E. SNAP_ID = : B3 AND B. DBID(+) = : B2. AND E. DBID = : B2 AND B. INSTANCE_NUMBER(+) = : B1 AND E. Problem Diagnosis | Springer. Link. How many times did DBAs have to open priority one service requests with Oracle support for critical errors faced while supporting their production environments? Errors are bound to happen, and as much as we would all like to see it, there is no such thing as the perfect application that is bug free. Critical errors can be caused by a misconfiguration or uncontrolled environments or due to human error; but when it occurs, they interrupt production, cause downtime, and slow performance that affects the credibility of the DBA, the system administrators, or the application in general. So it’s important that when problems do arise there is an immediate remedy, the database is operational immediately, and that the error has fixes in the form of patches or code, operational procedures, or configuration changes that ensure the errors do not happen again.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
November 2017
Categories |