Speaker: Bianca Schroeder (Professor at University of Toronto, Canada)
With rapid development of SSDs
- Storage landscape has changed
Focus on another angke, This talk: the Storage(SSD) reliability.
- Take a look at flash reliability in the wild.
- How can we protect against flash failures?
- [FAST’ 20] Drives in Netapp Enterprise Systems.
- AVG ARR 0.22%, but rates vary widely
- much lower than HDDs
- All SSDs experience bit errors.
- ECC to correct them, but sometimes uncorrectable
- if no data redundancy, lost
SLC, MLC, eMLC and TLC
differenct ARR, shows TLC has the most replacement rates.
- Usage affects the reliability of SSDs, due to wear-out of their cells.
- Hardware Failure (‘Bathtub Curve’) means SSDs has higher failure rate in their early life and wear-out.
- Compare individual firmware versions within the same family
FV has a tremendous impact on reliability (by a factor of 3-10x)!
Single parity rapid -> up to two failures
How common double failures
- RAID group size
- How frequency do double failures occur?
- How quickly after the first
- From CDF(Time Difference in days - Cumulative Probability), 46% successive failures occur on the same day!
- How are they related to RAID group size?
- Prediction: model (NN, Random Forest…)
- Usage: improve scrubbing
- Standard (fixed) -> dynamically add factor X with prediction
Interesting thing: the simple model (Random Forest) better than NN. Also mentioned as Question 3 from Zekai Sun in HKU.
- Major difference between DiLOS and LegoOS which also uses memory disaggregagion
Speaker: Kyle C. Hale (Illinois Institute of Technology)
Disaggregation at the Edge ->
- In Cyber Foraging, use devices would “live off the land”
- Applications would be modified to parition into disjoint components, offloaded, sometimes using VMs
- Cloud offload, but with mobility
This might be a “chicken or the egg” problem: …
Ref J.Flinn Cyber Foraging Fifteen Years Later IEEE Pervasive Computing
- Wireless latency continues to drop
Disaggregation resources at the Edge -> Ephemeral Single-System Image at the Edge
Principles and Characteristics
- Transparency: user no aware of coalescence
- Performance: offload shold only occur so as to improve
- Resilience: nodes come and go often
- Customizability: typesof resoucrces, when at what cost…
- Privacy and Security: same problems in IaaS
- Manually migrated 4 microservice applications
- complex and stateful
Increase the load until saturated
- An automatic tool is helpful if it can
- locate the call sites of the caller
- locate the inner handlers
Future: automatic tools instead manually adapt the microservice to serverless